首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The paper considers non-parametric maximum likelihood estimation of the failure time distribution for interval-censored data subject to misclassification. Such data can arise from two types of observation scheme; either where observations continue until the first positive test result or where tests continue regardless of the test results. In the former case, the misclassification probabilities must be known, whereas in the latter case, joint estimation of the event-time distribution and misclassification probabilities is possible. The regions for which the maximum likelihood estimate can only have support are derived. Algorithms for computing the maximum likelihood estimate are investigated and it is shown that algorithms appropriate for computing non-parametric mixing distributions perform better than an iterative convex minorant algorithm in terms of time to absolute convergence. A profile likelihood approach is proposed for joint estimation. The methods are illustrated on a data set relating to the onset of cardiac allograft vasculopathy in post-heart-transplantation patients.  相似文献   

2.
In this paper, we consider the classification of high-dimensional vectors based on a small number of training samples from each class. The proposed method follows the Bayesian paradigm, and it is based on a small vector which can be viewed as the regression of the new observation on the space spanned by the training samples. The classification method provides posterior probabilities that the new vector belongs to each of the classes, hence it adapts naturally to any number of classes. Furthermore, we show a direct similarity between the proposed method and the multicategory linear support vector machine introduced in Lee et al. [2004. Multicategory support vector machines: theory and applications to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99 (465), 67–81]. We compare the performance of the technique proposed in this paper with the SVM classifier using real-life military and microarray datasets. The study shows that the misclassification errors of both methods are very similar, and that the posterior probabilities assigned to each class are fairly accurate.  相似文献   

3.
Abstract

This paper presents a new method to estimate the quantiles of generic statistics by combining the concept of random weighting with importance resampling. This method converts the problem of quantile estimation to a dual problem of tail probabilities estimation. Random weighting theories are established to calculate the optimal resampling weights for estimation of tail probabilities via sequential variance minimization. Subsequently, the quantile estimation is constructed by using the obtained optimal resampling weights. Experimental results on real and simulated data sets demonstrate that the proposed random weighting method can effectively estimate the quantiles of generic statistics.  相似文献   

4.
The classification of a random variable based on a mixture can be meaningfully discussed only if the class of all finite mixtures is identifiable. In this paper, we find the maximum-likelihood estimates of the parameters of the mixture of two inverse Weibull distributions by using classified and unclassified observations. Next, we estimate the nonlinear discriminant function of the underlying model. Also, we calculate the total probabilities of misclassification as well as the percentage bias. In addition, we investigate the performance of all results through a series of simulation experiments by means of relative efficiencies. Finally, we analyse some simulated and real data sets through the findings of the paper.  相似文献   

5.
Ion Grama 《Statistics》2019,53(4):807-838
We propose an extension of the regular Cox's proportional hazards model which allows the estimation of the probabilities of rare events. It is known that when the data are heavily censored, the estimation of the tail of the survival distribution is not reliable. To improve the estimate of the baseline survival function in the range of the largest observed data and to extend it outside, we adjust the tail of the baseline distribution beyond some threshold by an extreme value model under appropriate assumptions. The survival distributions conditioned to the covariates are easily computed from the baseline. A procedure allowing an automatic choice of the threshold and an aggregated estimate of the survival probabilities are also proposed. The performance is studied by simulations and an application on two data sets is given.  相似文献   

6.
It is known that patients may cease participating in a longitudinal study and become lost to follow-up. The objective of this article is to present a Bayesian model to estimate the malaria transition probabilities considering individuals lost to follow-up. We consider a homogeneous population, and it is assumed that the considered period of time is small enough to avoid two or more transitions from one state of health to another. The proposed model is based on a Gibbs sampling algorithm that uses information of lost to follow-up at the end of the longitudinal study. To simulate the unknown number of individuals with positive and negative states of malaria at the end of the study and lost to follow-up, two latent variables were introduced in the model. We used a real data set and a simulated data to illustrate the application of the methodology. The proposed model showed a good fit to these data sets, and the algorithm did not show problems of convergence or lack of identifiability. We conclude that the proposed model is a good alternative to estimate probabilities of transitions from one state of health to the other in studies with low adherence to follow-up.  相似文献   

7.
二分类总体单元被错误分类情形下,样本比例是总体比例的有偏估计。给出调整比例估计的两种方法:双样本方法和极大似然法,补充了经典抽样理论比例估计的内容。  相似文献   

8.
This article considers multinomial data subject to misclassification in the presence of covariates which affect both the misclassification probabilities and the true classification probabilities. A subset of the data may be subject to a secondary measurement according to an infallible classifier. Computations are carried out in a Bayesian setting where it is seen that the prior has an important role in driving the inference. In addition, a new and less problematic definition of nonidentifiability is introduced and is referred to as hierarchical nonidentifiability.  相似文献   

9.
Inference for the state occupation probabilities, given a set of baseline covariates, is an important problem in survival analysis and time to event multistate data. We introduce an inverse censoring probability re-weighted semi-parametric single index model based approach to estimate conditional state occupation probabilities of a given individual in a multistate model under right-censoring. Besides obtaining a temporal regression function, we also test the potential time varying effect of a baseline covariate on future state occupation. We show that the proposed technique has desirable finite sample performances and its performance is competitive when compared with three other existing approaches. We illustrate the proposed methodology using two different data sets. First, we re-examine a well-known data set dealing with leukemia patients undergoing bone marrow transplant with various state transitions. Our second illustration is based on data from a study involving functional status of a set of spinal cord injured patients undergoing a rehabilitation program.  相似文献   

10.
The continual reassessment method (CRM) is a commonly used dose-finding design for phase I clinical trials. Practical applications of this method have been restricted by two limitations: (1) the requirement that the toxicity outcome needs to be observed shortly after the initiation of the treatment; and (2) the potential sensitivity to the prespecified toxicity probability at each dose. To overcome these limitations, we naturally treat the unobserved toxicity outcomes as missing data, and use the expectation-maximization (EM) algorithm to estimate the dose toxicity probabilities based on the incomplete data to direct dose assignment. To enhance the robustness of the design, we propose prespecifying multiple sets of toxicity probabilities, each set corresponding to an individual CRM model. We carry out these multiple CRMs in parallel, across which model selection and model averaging procedures are used to make more robust inference. We evaluate the operating characteristics of the proposed robust EM-CRM designs through simulation studies and show that the proposed methods satisfactorily resolve both limitations of the CRM. Besides improving the MTD selection percentage, the new designs dramatically shorten the duration of the trial, and are robust to the prespecification of the toxicity probabilities.  相似文献   

11.
A method of regularized discriminant analysis for discrete data, denoted DRDA, is proposed. This method is related to the regularized discriminant analysis conceived by Friedman (1989) in a Gaussian framework for continuous data. Here, we are concerned with discrete data and consider the classification problem using the multionomial distribution. DRDA has been conceived in the small-sample, high-dimensional setting. This method has a median position between multinomial discrimination, the first-order independence model and kernel discrimination. DRDA is characterized by two parameters, the values of which are calculated by minimizing a sample-based estimate of future misclassification risk by cross-validation. The first parameter is acomplexity parameter which provides class-conditional probabilities as a convex combination of those derived from the full multinomial model and the first-order independence model. The second parameter is asmoothing parameter associated with the discrete kernel of Aitchison and Aitken (1976). The optimal complexity parameter is calculated first, then, holding this parameter fixed, the optimal smoothing parameter is determined. A modified approach, in which the smoothing parameter is chosen first, is discussed. The efficiency of the method is examined with other classical methods through application to data.  相似文献   

12.
The von Mises-Fisher distribution is widely used for modeling directional data. In this article, we derive the discriminant rules based on this distribution to assign objects into pre-existing classes. We determine a distance between two von Mises-Fisher populations and we calculate estimates of the misclassification probabilities. We also analyze the behavior of the distance between two von Mises-Fisher populations and of the estimates of the misclassification probabilities when we modify the parameters of the populations or the samples size or the dimension of the sphere. Finally, we present an example with real spherical data available in the literature.  相似文献   

13.
It is well known that statistical classifiers trained from imbalanced data lead to low true positive rates and select inconsistent significant variables. In this article, an improved method is proposed to enhance the classification accuracy for the minority class by differentiating misclassification cost for each group. The overall error rate is replaced by an alternative composite criterion. Furthermore, we propose an approach to estimate the tuning parameter, the composite criterion, and the cut-point simultaneously. Simulations show that the proposed method achieves a high true positive rate on prediction and a good performance on variable selection for both continuous and categorical predictors, even with highly imbalanced data. An illustrative example of the analysis of the suboptimal health state data in traditional Chinese medicine is discussed to show the reasonable application of the proposed method.  相似文献   

14.
We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal component method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (multiple imputation using the loglinear model, multiple imputation by logistic regressions) as well to the latest works on the topic (multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimate of the variability of the estimators. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other multiple imputation methods.  相似文献   

15.
This paper develops a method for handling two-class classification problems with highly unbalanced class sizes and misclassification costs. When the class sizes are highly unbalanced and the minority class represents a rare event, conventional classification methods tend to strongly favour the majority class, resulting in very low detection of the minority class. A method is proposed to determine the optimal cut-off for asymmetric misclassification costs and for unbalanced class sizes. Monte Carlo simulations show that this proposal performs better than the method based on the notion of classification accuracy. Finally, the proposed method is applied to empirical data on Italian small and medium enterprises to classify them into default and non-default groups.  相似文献   

16.
A nonparametric estimate for the posterior probabilities in the classification problem using multivariate thin plate splines is proposed. This method presents a nonpararnetric alternative to logistic discrimination as well as to survival curve estimation. The degree of smoothness of the estimate is determined from the data using generalized crossvalidation.  相似文献   

17.
ABSTRACT

Classification rules with a reserve judgment option provide a way to satisfy constraints on the misclassification probabilities when there is a high degree of overlap among the populations. Constructing rules which maximize the probability of correct classification while satisfying such constraints is a difficult optimization problem. This paper uses the form of the optimal solution to develop a relatively simple and computationally fast method for three populations which has a non parametric quality in controlling the misclassification probabilities. Simulations demonstrate that this procedure performs well.  相似文献   

18.
This article considers misclassification of categorical covariates in the context of regression analysis; if unaccounted for, such errors usually result in mis-estimation of model parameters. With the presence of additional covariates, we exploit the fact that explicitly modelling non-differential misclassification with respect to the response leads to a mixture regression representation. Under the framework of mixture of experts, we enable the reclassification probabilities to vary with other covariates, a situation commonly caused by misclassification that is differential on certain covariates and/or by dependence between the misclassified and additional covariates. Using Bayesian inference, the mixture approach combines learning from data with external information on the magnitude of errors when it is available. In addition to proving the theoretical identifiability of the mixture of experts approach, we study the amount of efficiency loss resulting from covariate misclassification and the usefulness of external information in mitigating such loss. The method is applied to adjust for misclassification on self-reported cocaine use in the Longitudinal Studies of HIV-Associated Lung Infections and Complications.  相似文献   

19.
Highly skewed and non-negative data can often be modeled by the delta-lognormal distribution in fisheries research. However, the coverage probabilities of extant interval estimation procedures are less satisfactory in small sample sizes and highly skewed data. We propose a heuristic method of estimating confidence intervals for the mean of the delta-lognormal distribution. This heuristic method is an estimation based on asymptotic generalized pivotal quantity to construct generalized confidence interval for the mean of the delta-lognormal distribution. Simulation results show that the proposed interval estimation procedure yields satisfactory coverage probabilities, expected interval lengths and reasonable relative biases. Finally, the proposed method is employed in red cod densities data for a demonstration.  相似文献   

20.
On Maximum Depth and Related Classifiers   总被引:1,自引:0,他引:1  
Abstract.  Over the last couple of decades, data depth has emerged as a powerful exploratory and inferential tool for multivariate data analysis with wide-spread applications. This paper investigates the possible use of different notions of data depth in non-parametric discriminant analysis. First, we consider the situation where the prior probabilities of the competing populations are all equal and investigate classifiers that assign an observation to the population with respect to which it has the maximum location depth. We propose a different depth-based classification technique for unequal prior problems, which is also useful for equal prior cases, especially when the populations have different scatters and shapes. We use some simulated data sets as well as some benchmark real examples to evaluate the performance of these depth-based classifiers. Large sample behaviour of the misclassification rates of these depth-based non-parametric classifiers have been derived under appropriate regularity conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号