首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
ABSTRACT

Motivated by a longitudinal oral health study, the Signal-Tandmobiel® study, a Bayesian approach has been developed to model misclassified ordinal response data. Two regression models have been considered to incorporate misclassification in the categorical response. Specifically, probit and logit models have been developed. The computational difficulties have been avoided by using data augmentation. This idea is exploited to derive efficient Markov chain Monte Carlo methods. Although the method is proposed for ordered categories, it can also be implemented for unordered ones in a simple way. The model performance is shown through a simulation-based example and the analysis of the motivating study.  相似文献   

2.
Measurement error and misclassification arise commonly in various data collection processes. It is well-known that ignoring these features in the data analysis usually leads to biased inference. With the generalized linear model setting, Yi et al. [Functional and structural methods with mixed measurement error and misclassification in covariates. J Am Stat Assoc. 2015;110:681–696] developed inference methods to adjust for the effects of measurement error in continuous covariates and misclassification in discrete covariates simultaneously for the scenario where validation data are available. The augmented simulation-extrapolation (SIMEX) approach they developed generalizes the usual SIMEX method which is only applicable to handle continuous error-prone covariates. To implement this method, we develop an R package, augSIMEX, for public use. Simulation studies are conducted to illustrate the use of the algorithm. This package is available at CRAN.  相似文献   

3.
Abstract

We consider the classification of high-dimensional data under the strongly spiked eigenvalue (SSE) model. We create a new classification procedure on the basis of the high-dimensional eigenstructure in high-dimension, low-sample-size context. We propose a distance-based classification procedure by using a data transformation. We also prove that our proposed classification procedure has consistency property for misclassification rates. We discuss performances of our classification procedure in simulations and real data analyses using microarray data sets.  相似文献   

4.
ABSTRACT

Classification rules with a reserve judgment option provide a way to satisfy constraints on the misclassification probabilities when there is a high degree of overlap among the populations. Constructing rules which maximize the probability of correct classification while satisfying such constraints is a difficult optimization problem. This paper uses the form of the optimal solution to develop a relatively simple and computationally fast method for three populations which has a non parametric quality in controlling the misclassification probabilities. Simulations demonstrate that this procedure performs well.  相似文献   

5.
ABSTRACT

In this article we consider two methods for combining a number of individual classifiers in order to construct more effective classification rules. The effectiveness of these methods, as measured by a comparison of their misclassification error rates with those of the individual classifiers, is assessed via a number of examples that involve simulated data. We also compare the results to those of two existing combining procedures.  相似文献   

6.
The effects of applying the normal classificatory rule to a nonnormal population are studied here. These are assessed through the distribution of the misclassification errors in the case of the Edgeworth type distribution. Both theoretical and empirical results are presented. An examination of the latter shows that the effects of this type of nonnormality are marginal. The probability of misclassification of an observation from ∏1, using the appropriate LR rule, is always larger than one using the normal approximation (μ1<μ2). Converse condition holds for the misclassification of an observation from ∏2. Overall error rates are not affected by the skewness factor to any great extent.  相似文献   

7.
Abstract

In this paper, we propose an outlier-detection approach that uses the properties of an intercept estimator in a difference-based regression model (DBRM) that we first introduce. This DBRM uses multiple linear regression, and invented it to detect outliers in a multiple linear regression. Our outlier-detection approach uses only the intercept; it does not require estimates for the other parameters in the DBRM. In this paper, we first employed a difference-based intercept estimator to study the outlier-detection problem in a multiple regression model. We compared our approach with several existing methods in a simulation study and the results suggest that our approach outperformed the others. We also demonstrated the advantage of our approach using a real data application. Our approach can extend to nonparametric regression models for outliers detection.  相似文献   

8.
ABSTRACT

In this study, Monte Carlo simulation experiments were employed to examine the performance of four statistical two-group classification methods when the data distributions are skewed and misclassification costs are unequal, conditions frequently encountered in business and economic applications. The classification methods studied are linear and quadratic parametric, nearest neighbor and logistic regression methods. It was found that when skewness is moderate, the parametric methods tend to give best results. Depending on the specific data condition, when skewness is high, either the linear parametric, logistic regression, or the nearest-neighbor method gives the best results. When misclassification costs differ widely across groups, the linear parametric method is favored over the other methods for many of the data conditions studied.  相似文献   

9.
When classification rules are constructed using sample estimatest it is known that the probability of misclassification is not minimized. This article introduces a biased minimum X2 rule to classify items from a multivariate normal population. Using the principle of variance reduction, the probability of misclassification is reduced when the biased procedure is employed. Results of sampling experiments over a broad range of conditions are provided to demonstrate this improvement.  相似文献   

10.
This article considers misclassification of categorical covariates in the context of regression analysis; if unaccounted for, such errors usually result in mis-estimation of model parameters. With the presence of additional covariates, we exploit the fact that explicitly modelling non-differential misclassification with respect to the response leads to a mixture regression representation. Under the framework of mixture of experts, we enable the reclassification probabilities to vary with other covariates, a situation commonly caused by misclassification that is differential on certain covariates and/or by dependence between the misclassified and additional covariates. Using Bayesian inference, the mixture approach combines learning from data with external information on the magnitude of errors when it is available. In addition to proving the theoretical identifiability of the mixture of experts approach, we study the amount of efficiency loss resulting from covariate misclassification and the usefulness of external information in mitigating such loss. The method is applied to adjust for misclassification on self-reported cocaine use in the Longitudinal Studies of HIV-Associated Lung Infections and Complications.  相似文献   

11.
ABSTRACT

Background: Many exposures in epidemiological studies have nonlinear effects and the problem is to choose an appropriate functional relationship between such exposures and the outcome. One common approach is to investigate several parametric transformations of the covariate of interest, and to select a posteriori the function that fits the data the best. However, such approach may result in an inflated Type I error. Methods: Through a simulation study, we generated data from Cox's models with different transformations of a single continuous covariate. We investigated the Type I error rate and the power of the likelihood ratio test (LRT) corresponding to three different procedures that considered the same set of parametric dose-response functions. The first unconditional approach did not involve any model selection, while the second conditional approach was based on a posteriori selection of the parametric function. The proposed third approach was similar to the second except that it used a corrected critical value for the LRT to ensure a correct Type I error. Results: The Type I error rate of the second approach was two times higher than the nominal size. For simple monotone dose-response, the corrected test had similar power as the unconditional approach, while for non monotone, dose-response, it had a higher power. A real-life application that focused on the effect of body mass index on the risk of coronary heart disease death, illustrated the advantage of the proposed approach. Conclusion: Our results confirm that a posteriori selecting the functional form of the dose-response induces a Type I error inflation. The corrected procedure, which can be applied in a wide range of situations, may provide a good trade-off between Type I error and power.  相似文献   

12.
ABSTRACT

The parameters of stable law parameters can be estimated using a regression based approach involving the empirical characteristic function. One approach is to use a fixed number of points for all parameters of the distribution to estimate the characteristic function. In this work the results are derived where all points in an interval is used to estimate the empirical characteristic function, thus least squares estimators of a linear function of the parameters, using an infinite number of observations. It was found that the procedure performs very good in small samples.  相似文献   

13.
In the quantitative group testing problem, the use of the group mean to identify if the group maximum is greater than a prefixed threshold (infected group) is analyzed, using n independent and identically distributed individuals. Under these conditions, it is shown that the information of the mean is sufficient to classify each group as infected or healthy with low probability of misclassification when the underline distribution is a unilateral heavy-tailed distribution.  相似文献   

14.
Previous work has been carried out on the use of double-sampling schemes for inference from categorical data subject to misclassification. The double-sampling schemes utilize a sample of n units classified by both a fallible and true device and another sample of n2 units classified only by a fallible device. In actual applications, one often hasavailable a third sample of n1 units, which is classified only by the true device. In this article we develop techniques of fitting log-linear models under various misclassification structures for a general triple-sampling scheme. The estimation is by maximum likelihood and the fitted models are hierarchical. The methodology is illustrated by applying it to data in traffic safety research from a study on the effectiveness of belts in reducing injuries.  相似文献   

15.
In partly linear models, the dependence of the response y on (x T, t) is modeled through the relationship y=x T β+g(t)+?, where ? is independent of (x T, t). We are interested in developing an estimation procedure that allows us to combine the flexibility of the partly linear models, studied by several authors, but including some variables that belong to a non-Euclidean space. The motivating application of this paper deals with the explanation of the atmospheric SO2 pollution incidents using these models when some of the predictive variables belong in a cylinder. In this paper, the estimators of β and g are constructed when the explanatory variables t take values on a Riemannian manifold and the asymptotic properties of the proposed estimators are obtained under suitable conditions. We illustrate the use of this estimation approach using an environmental data set and we explore the performance of the estimators through a simulation study.  相似文献   

16.
The paper considers non-parametric maximum likelihood estimation of the failure time distribution for interval-censored data subject to misclassification. Such data can arise from two types of observation scheme; either where observations continue until the first positive test result or where tests continue regardless of the test results. In the former case, the misclassification probabilities must be known, whereas in the latter case, joint estimation of the event-time distribution and misclassification probabilities is possible. The regions for which the maximum likelihood estimate can only have support are derived. Algorithms for computing the maximum likelihood estimate are investigated and it is shown that algorithms appropriate for computing non-parametric mixing distributions perform better than an iterative convex minorant algorithm in terms of time to absolute convergence. A profile likelihood approach is proposed for joint estimation. The methods are illustrated on a data set relating to the onset of cardiac allograft vasculopathy in post-heart-transplantation patients.  相似文献   

17.
Summary: In this paper the complexity of high dimensional data with cyclical variation is reduced using analysis of variance and factor analysis. It is shown that the prediction of a small number of main cyclical factors is more useful than forecasting all the time-points separately as it is usually done by seasonal time series models. To give an example for this approach we analyze the electricity demand per quarter of an hour of industrial customers in Germany. The necessity of such predictions results from the liberalization of the German electricity market in 1998 due to legal requirements of the EC in 1996.  相似文献   

18.
In this paper, a two-parameter discrete distribution named Misclassified Size Biased Discrete Lindley distribution is defined under the situation of misclassification where some of the observations corresponding to x = c + 1 are reported as x = c with misclassification errorα. Different estimation methods like maximum likelihood estimation, moment estimation, and Bayes Estimation are considered to estimate the parameters of Misclassified Size Biased Discrete Lindley distribution. These methods are compared by using mean square error through simulation study with varying sample sizes. Further general form of factorial moment is also obtained for Misclassified Size Biased Discrete Lindley distribution. Real life data set is used to fit Misclassified Size Biased Discrete Lindley distribution.  相似文献   

19.
ABSTRACT

We study partial linear models where the linear covariates are endogenous and cause an over-identified problem. We propose combining the profile principle with local linear approximation and the generalized moment methods (GMM) to estimate the parameters of interest. We show that the profiled GMM estimators are root? n consistent and asymptotically normally distributed. By appropriately choosing the weight matrix, the estimators can attain the efficiency bound. We further consider variable selection by using the moment restrictions imposed on endogenous variables when the dimension of the covariates may be diverging with the sample size, and propose a penalized GMM procedure, which is shown to have the sparsity property. We establish asymptotic normality of the resulting estimators of the nonzero parameters. Simulation studies have been presented to assess the finite-sample performance of the proposed procedure.  相似文献   

20.
Estimated associations between an outcome variable and misclassified covariates tend to be biased when the methods of estimation that ignore the classification error are applied. Available methods to account for misclassification often require the use of a validation sample (i.e. a gold standard). In practice, however, such a gold standard may be unavailable or impractical. We propose a Bayesian approach to adjust for misclassification in a binary covariate in the random effect logistic model when a gold standard is not available. This Markov Chain Monte Carlo (MCMC) approach uses two imperfect measures of a dichotomous exposure under the assumptions of conditional independence and non-differential misclassification. A simulated numerical example and a real clinical example are given to illustrate the proposed approach. Our results suggest that the estimated log odds of inpatient care and the corresponding standard deviation are much larger in our proposed method compared with the models ignoring misclassification. Ignoring misclassification produces downwardly biased estimates and underestimate uncertainty.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号