期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Non-parametric maximum likelihood estimation of interval-censored failure time data subject to misclassification

Andrew C. Titman 《Statistics and Computing》2017,27(6):1585-1593

The paper considers non-parametric maximum likelihood estimation of the failure time distribution for interval-censored data subject to misclassification. Such data can arise from two types of observation scheme; either where observations continue until the first positive test result or where tests continue regardless of the test results. In the former case, the misclassification probabilities must be known, whereas in the latter case, joint estimation of the event-time distribution and misclassification probabilities is possible. The regions for which the maximum likelihood estimate can only have support are derived. Algorithms for computing the maximum likelihood estimate are investigated and it is shown that algorithms appropriate for computing non-parametric mixing distributions perform better than an iterative convex minorant algorithm in terms of time to absolute convergence. A profile likelihood approach is proposed for joint estimation. The methods are illustrated on a data set relating to the onset of cardiac allograft vasculopathy in post-heart-transplantation patients. 相似文献

2.

SVM-like decision theoretical classification of high-dimensional vectors

David J. Bradshaw Marianna Pensky 《Journal of statistical planning and inference》2010

In this paper, we consider the classification of high-dimensional vectors based on a small number of training samples from each class. The proposed method follows the Bayesian paradigm, and it is based on a small vector which can be viewed as the regression of the new observation on the space spanned by the training samples. The classification method provides posterior probabilities that the new vector belongs to each of the classes, hence it adapts naturally to any number of classes. Furthermore, we show a direct similarity between the proposed method and the multicategory linear support vector machine introduced in Lee et al. [2004. Multicategory support vector machines: theory and applications to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99 (465), 67–81]. We compare the performance of the technique proposed in this paper with the SVM classifier using real-life military and microarray datasets. The study shows that the misclassification errors of both methods are very similar, and that the posterior probabilities assigned to each class are fairly accurate. 相似文献

3.

Random weighting-based quantile estimation via importance resampling

Wenhui Wei Shesheng Gao Yongmin Zhong Chengfan Gu Zhaohui Gao 《统计学通讯:理论与方法》2013,42(19):4820-4833

Abstract

This paper presents a new method to estimate the quantiles of generic statistics by combining the concept of random weighting with importance resampling. This method converts the problem of quantile estimation to a dual problem of tail probabilities estimation. Random weighting theories are established to calculate the optimal resampling weights for estimation of tail probabilities via sequential variance minimization. Subsequently, the quantile estimation is constructed by using the obtained optimal resampling weights. Experimental results on real and simulated data sets demonstrate that the proposed random weighting method can effectively estimate the quantiles of generic statistics. 相似文献

4.

Estimation of a discriminant function from a mixture of two inverse Weibull distributions

K. S. Sultan A. S. Al-Moisheer 《Journal of Statistical Computation and Simulation》2013,83(3):405-416

The classification of a random variable based on a mixture can be meaningfully discussed only if the class of all finite mixtures is identifiable. In this paper, we find the maximum-likelihood estimates of the parameters of the mixture of two inverse Weibull distributions by using classified and unclassified observations. Next, we estimate the nonlinear discriminant function of the underlying model. Also, we calculate the total probabilities of misclassification as well as the percentage bias. In addition, we investigate the performance of all results through a series of simulation experiments by means of relative efficiencies. Finally, we analyse some simulated and real data sets through the findings of the paper. 相似文献

5.

Estimation of extreme survival probabilities with cox model

Ion Grama 《Statistics》2019,53(4):807-838

We propose an extension of the regular Cox's proportional hazards model which allows the estimation of the probabilities of rare events. It is known that when the data are heavily censored, the estimation of the tail of the survival distribution is not reliable. To improve the estimate of the baseline survival function in the range of the largest observed data and to extend it outside, we adjust the tail of the baseline distribution beyond some threshold by an extreme value model under appropriate assumptions. The survival distributions conditioned to the covariates are easily computed from the baseline. A procedure allowing an automatic choice of the threshold and an aggregated estimate of the survival probabilities are also proposed. The performance is studied by simulations and an application on two data sets is given. 相似文献

6.

A Bayesian model for estimating the malaria transition probabilities considering individuals lost to follow-up

Edson Zangiacomi Martinez Davi Casale Aragon Jorge Alberto Achcar 《Journal of applied statistics》2011,38(6):1303-1309

It is known that patients may cease participating in a longitudinal study and become lost to follow-up. The objective of this article is to present a Bayesian model to estimate the malaria transition probabilities considering individuals lost to follow-up. We consider a homogeneous population, and it is assumed that the considered period of time is small enough to avoid two or more transitions from one state of health to another. The proposed model is based on a Gibbs sampling algorithm that uses information of lost to follow-up at the end of the longitudinal study. To simulate the unknown number of individuals with positive and negative states of malaria at the end of the study and lost to follow-up, two latent variables were introduced in the model. We used a real data set and a simulated data to illustrate the application of the methodology. The proposed model showed a good fit to these data sets, and the algorithm did not show problems of convergence or lack of identifiability. We conclude that the proposed model is a good alternative to estimate probabilities of transitions from one state of health to the other in studies with low adherence to follow-up. 相似文献

7.

分类数据的计量误差模型

巩红禹金勇进贺本岚《统计与信息论坛》2010,25(11):3-6

二分类总体单元被错误分类情形下,样本比例是总体比例的有偏估计。给出调整比例估计的两种方法：双样本方法和极大似然法,补充了经典抽样理论比例估计的内容。相似文献

8.

Inference for misclassified multinomial data with covariates

Shijia Wang Liangliang Wang Tim B. Swartz 《Revue canadienne de statistique》2020,48(4):655-669

This article considers multinomial data subject to misclassification in the presence of covariates which affect both the misclassification probabilities and the true classification probabilities. A subset of the data may be subject to a secondary measurement according to an infallible classifier. Computations are carried out in a Bayesian setting where it is seen that the prior has an important role in driving the inference. In addition, a new and less problematic definition of nonidentifiability is introduced and is referred to as hierarchical nonidentifiability. 相似文献

9.

Flexible semi-parametric regression of state occupational probabilities in a multistate model with right-censored data

Chathura Siriwardhana K. B. Kulasekera Somnath Datta 《Lifetime data analysis》2018,24(3):464-491

Inference for the state occupation probabilities, given a set of baseline covariates, is an important problem in survival analysis and time to event multistate data. We introduce an inverse censoring probability re-weighted semi-parametric single index model based approach to estimate conditional state occupation probabilities of a given individual in a multistate model under right-censoring. Besides obtaining a temporal regression function, we also test the potential time varying effect of a baseline covariate on future state occupation. We show that the proposed technique has desirable finite sample performances and its performance is competitive when compared with three other existing approaches. We illustrate the proposed methodology using two different data sets. First, we re-examine a well-known data set dealing with leukemia patients undergoing bone marrow transplant with various state transitions. Our second illustration is based on data from a study involving functional status of a set of spinal cord injured patients undergoing a rehabilitation program. 相似文献

10.

Robust EM Continual Reassessment Method in Oncology Dose Finding

Yuan Y Yin G 《Journal of the American Statistical Association》2011,106(495):818-831

The continual reassessment method (CRM) is a commonly used dose-finding design for phase I clinical trials. Practical applications of this method have been restricted by two limitations: (1) the requirement that the toxicity outcome needs to be observed shortly after the initiation of the treatment; and (2) the potential sensitivity to the prespecified toxicity probability at each dose. To overcome these limitations, we naturally treat the unobserved toxicity outcomes as missing data, and use the expectation-maximization (EM) algorithm to estimate the dose toxicity probabilities based on the incomplete data to direct dose assignment. To enhance the robustness of the design, we propose prespecifying multiple sets of toxicity probabilities, each set corresponding to an individual CRM model. We carry out these multiple CRMs in parallel, across which model selection and model averaging procedures are used to make more robust inference. We evaluate the operating characteristics of the proposed robust EM-CRM designs through simulation studies and show that the proposed methods satisfactorily resolve both limitations of the CRM. Besides improving the MTD selection percentage, the new designs dramatically shorten the duration of the trial, and are robust to the prespecification of the toxicity probabilities. 相似文献

11.

Discrete regularized discriminant analysis

Gilles Celeux Abdallah Mkhadri 《Statistics and Computing》1992,2(3):143-151

A method of regularized discriminant analysis for discrete data, denoted DRDA, is proposed. This method is related to the regularized discriminant analysis conceived by Friedman (1989) in a Gaussian framework for continuous data. Here, we are concerned with discrete data and consider the classification problem using the multionomial distribution. DRDA has been conceived in the small-sample, high-dimensional setting. This method has a median position between multinomial discrimination, the first-order independence model and kernel discrimination. DRDA is characterized by two parameters, the values of which are calculated by minimizing a sample-based estimate of future misclassification risk by cross-validation. The first parameter is acomplexity parameter which provides class-conditional probabilities as a convex combination of those derived from the full multinomial model and the first-order independence model. The second parameter is asmoothing parameter associated with the discrete kernel of Aitchison and Aitken (1976). The optimal complexity parameter is calculated first, then, holding this parameter fixed, the optimal smoothing parameter is determined. A modified approach, in which the smoothing parameter is chosen first, is discussed. The efficiency of the method is examined with other classical methods through application to data. 相似文献

12.

Discriminant Analysis for the von Mises-Fisher Distribution

Adelaide Figueiredo 《统计学通讯:模拟与计算》2013,42(9):1991-2003

The von Mises-Fisher distribution is widely used for modeling directional data. In this article, we derive the discriminant rules based on this distribution to assign objects into pre-existing classes. We determine a distance between two von Mises-Fisher populations and we calculate estimates of the misclassification probabilities. We also analyze the behavior of the distance between two von Mises-Fisher populations and of the estimates of the misclassification probabilities when we modify the parameters of the populations or the samples size or the dimension of the sphere. Finally, we present an example with real spherical data available in the literature. 相似文献

13.

Regularized receiver operating characteristic-based logistic regression for grouped variable selection with composite criterion

Yang Li Chenqun Yu Yichen Qin Limin Wang Jiaxu Chen Danhui Yi 《Journal of Statistical Computation and Simulation》2015,85(13):2582-2595

It is well known that statistical classifiers trained from imbalanced data lead to low true positive rates and select inconsistent significant variables. In this article, an improved method is proposed to enhance the classification accuracy for the minority class by differentiating misclassification cost for each group. The overall error rate is replaced by an alternative composite criterion. Furthermore, we propose an approach to estimate the tuning parameter, the composite criterion, and the cut-point simultaneously. Simulations show that the proposed method achieves a high true positive rate on prediction and a good performance on variable selection for both continuous and categorical predictors, even with highly imbalanced data. An illustrative example of the analysis of the suboptimal health state data in traditional Chinese medicine is discussed to show the reasonable application of the proposed method. 相似文献

14.

MIMCA: multiple imputation for categorical variables with multiple correspondence analysis

Vincent Audigier François Husson Julie Josse 《Statistics and Computing》2017,27(2):501-518

We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal component method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (multiple imputation using the loglinear model, multiple imputation by logistic regressions) as well to the latest works on the topic (multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimate of the variability of the estimators. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other multiple imputation methods. 相似文献

15.

Optimal cut-off for rare events and unbalanced misclassification costs

Raffaella Calabrese 《Journal of applied statistics》2014,41(8):1678-1693

This paper develops a method for handling two-class classification problems with highly unbalanced class sizes and misclassification costs. When the class sizes are highly unbalanced and the minority class represents a rare event, conventional classification methods tend to strongly favour the majority class, resulting in very low detection of the minority class. A method is proposed to determine the optimal cut-off for asymmetric misclassification costs and for unbalanced class sizes. Monte Carlo simulations show that this proposal performs better than the method based on the notion of classification accuracy. Finally, the proposed method is applied to empirical data on Italian small and medium enterprises to classify them into default and non-default groups. 相似文献

16.

Multivariate thin plate spline estimates for the posterior probabilities in the classification problem

Miguel A. Villalobos Grace Wahba 《统计学通讯:理论与方法》2013,42(13):1449-1479

A nonparametric estimate for the posterior probabilities in the classification problem using multivariate thin plate splines is proposed. This method presents a nonpararnetric alternative to logistic discrimination as well as to survival curve estimation. The degree of smoothness of the estimate is determined from the data using generalized crossvalidation. 相似文献

17.

A three-population constrained discrimination procedure

David Patterson 《统计学通讯:理论与方法》2013,42(16):4771-4787

ABSTRACT

Classification rules with a reserve judgment option provide a way to satisfy constraints on the misclassification probabilities when there is a high degree of overlap among the populations. Constructing rules which maximize the probability of correct classification while satisfying such constraints is a difficult optimization problem. This paper uses the form of the optimal solution to develop a relatively simple and computationally fast method for three populations which has a non parametric quality in controlling the misclassification probabilities. Simulations demonstrate that this procedure performs well. 相似文献

18.

A Bayesian mixture of experts approach to covariate misclassification

Michelle Xia P. Richard Hahn Paul Gustafson 《Revue canadienne de statistique》2020,48(4):731-750

This article considers misclassification of categorical covariates in the context of regression analysis; if unaccounted for, such errors usually result in mis-estimation of model parameters. With the presence of additional covariates, we exploit the fact that explicitly modelling non-differential misclassification with respect to the response leads to a mixture regression representation. Under the framework of mixture of experts, we enable the reclassification probabilities to vary with other covariates, a situation commonly caused by misclassification that is differential on certain covariates and/or by dependence between the misclassified and additional covariates. Using Bayesian inference, the mixture approach combines learning from data with external information on the magnitude of errors when it is available. In addition to proving the theoretical identifiability of the mixture of experts approach, we study the amount of efficiency loss resulting from covariate misclassification and the usefulness of external information in mitigating such loss. The method is applied to adjust for misclassification on self-reported cocaine use in the Longitudinal Studies of HIV-Associated Lung Infections and Complications. 相似文献

19.

Generalized confidence interval estimation for the mean of delta-lognormal distribution: an application to New Zealand trawl survey data

Wei-Hwa Wu Hsin-Neng Hsieh 《Journal of applied statistics》2014,41(7):1471-1485

Highly skewed and non-negative data can often be modeled by the delta-lognormal distribution in fisheries research. However, the coverage probabilities of extant interval estimation procedures are less satisfactory in small sample sizes and highly skewed data. We propose a heuristic method of estimating confidence intervals for the mean of the delta-lognormal distribution. This heuristic method is an estimation based on asymptotic generalized pivotal quantity to construct generalized confidence interval for the mean of the delta-lognormal distribution. Simulation results show that the proposed interval estimation procedure yields satisfactory coverage probabilities, expected interval lengths and reasonable relative biases. Finally, the proposed method is employed in red cod densities data for a demonstration. 相似文献

20.

On Maximum Depth and Related Classifiers 总被引：1，自引：0，他引：1

ANIL K. GHOSH PROBAL CHAUDHURI 《Scandinavian Journal of Statistics》2005,32(2):327-350

Abstract. Over the last couple of decades, data depth has emerged as a powerful exploratory and inferential tool for multivariate data analysis with wide-spread applications. This paper investigates the possible use of different notions of data depth in non-parametric discriminant analysis. First, we consider the situation where the prior probabilities of the competing populations are all equal and investigate classifiers that assign an observation to the population with respect to which it has the maximum location depth. We propose a different depth-based classification technique for unequal prior problems, which is also useful for equal prior cases, especially when the populations have different scatters and shapes. We use some simulated data sets as well as some benchmark real examples to evaluate the performance of these depth-based classifiers. Large sample behaviour of the misclassification rates of these depth-based non-parametric classifiers have been derived under appropriate regularity conditions. 相似文献