首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Many sparse linear discriminant analysis (LDA) methods have been proposed to overcome the major problems of the classic LDA in high‐dimensional settings. However, the asymptotic optimality results are limited to the case with only two classes. When there are more than two classes, the classification boundary is complicated and no explicit formulas for the classification errors exist. We consider the asymptotic optimality in the high‐dimensional settings for a large family of linear classification rules with arbitrary number of classes. Our main theorem provides easy‐to‐check criteria for the asymptotic optimality of a general classification rule in this family as dimensionality and sample size both go to infinity and the number of classes is arbitrary. We establish the corresponding convergence rates. The general theory is applied to the classic LDA and the extensions of two recently proposed sparse LDA methods to obtain the asymptotic optimality.  相似文献   

判别分析与Logistic回归的模拟比较   总被引:3,自引:2,他引:3  
利用随机模拟方法,研究判别分析和Logistic回归分类的回判正确率。模拟结果显示,Logistic回归的回判正确率优于判别分析。随着随机误差的增大,Logistic回归与判别分析的回判正确率差异逐渐减小。随机误差超过一定界限,Logistic回归的回判正确率低于判别分析。在随机模拟的基础上,引入修正Logistic回归分类,模拟结果显示,修正Logistic回归分类略优于Logistic回归。  相似文献   

We consider the discriminant rule in a high-dimensional setting, i.e., when the number of feature variables p is comparable to or larger than the number of observations N. The discriminant rule must be modified in order to cope with singular sample covariance matrix in high-dimension. One way to do so is by considering the Moor-Penrose inverse matrix. Recently, Srivastava (2006 Srivastava , M. S. ( 2006 ). Minimum distance classification rules for high dimensional data . J. Multivariate Anal. 97 : 20572070 .[Crossref], [Web of Science ®] [Google Scholar]) proposed maximum likelihood ratio rule by using Moor-Penrose inverse matrix of sample covariance matrix. In this article, we consider the linear discriminant rule by using Moor-Penrose inverse matrix of sample covariance matrix (LDRMP). With the discriminant rule, the expected probability of misclassification (EPMC) is commonly used as measure of the classification accuracy. We investigate properties of EPMC for LDRMP in high-dimension as well as the one of the maximum likelihood rule given by Srivastava (2006 Srivastava , M. S. ( 2006 ). Minimum distance classification rules for high dimensional data . J. Multivariate Anal. 97 : 20572070 .[Crossref], [Web of Science ®] [Google Scholar]). From our asymptotic results, we show that the classification accuracy of LDRMP depends on new distance. Additionally, our asymptotic result is verified by using the Monte Carlo simulation.  相似文献   

This article deals with a criterion for selection of variables for the multiple group discriminant analysis in high-dimensional data. The variable selection models considered for discriminant analysis in Fujikoshi (1985 Fujikoshi , Y. ( 1985 ). Selection of variables in discriminant analysis and canonical correlation analysis . In: Krishnaiah , P. R. , ed. Multivariate Analysis . Vol. VI. Amsterdam : North-Holland , pp. 219236 . [Google Scholar], 2002 Fujikoshi , Y. ( 2002 ). Selection of variables for discriminant analysis in a high-dimensional case . Sankhya Ser. A 64 : 256257 . [Google Scholar]) are the ones based on additional information due to Rao (1948 Rao , C. R. ( 1948 ). Tests of significance in multivariate analysis . Biometrika 35 : 5879 .[Crossref], [PubMed], [Web of Science ®] [Google Scholar], 1970 Rao , C. R. ( 1970 ). Inference on discriminant function coefficients . In: Bose , R. C. , ed. Essays in Probability and Statistics . Chapel Hill , NC : University of North Carolina Press , pp. 537602 . [Google Scholar]). Our criterion is based on Akaike information criterion (AIC) for this model. The AIC has been successfully used in the literature in model selection when the dimension p is smaller than the sample size N. However, the case when p > N has not been considered in the literature, because MLE can not be estimated corresponding to singularity of the within-group covariance matrix. A popular method used to address the singularity problem in high-dimensional classification is the regularized method, which replaces the within-group sample covariance matrix with a ridge-type covariance estimate to stabilize the estimate. In this article, we propose AIC-type criterion by replacing MLE of the within-group covariance matrix with ridge-type estimator. This idea follows Srivastava and Kubokawa (2008 Srivastava , M. S. , Kubokawa , T. ( 2008 ). Akaike information criterion for selecting components of the mean vector in high dimensional data with fewer observations . J. Japan Statist. Soc. 38 : 259283 . [Google Scholar]). Simulations revealed that our proposed criterion performs well.  相似文献   

为了把两个总体判别分析中的ROC曲线推广到了多个总体的情形,根据两个总体判断分析中的ROC曲线变换,得到了多个总体判别分析中的ROC曲面,并研究了其某些性质。  相似文献   

We develop functional data analysis techniques using the differential geometry of a manifold of smooth elastic functions on an interval in which the functions are represented by a log-speed function and an angle function. The manifold's geometry provides a method for computing a sample mean function and principal components on tangent spaces. Using tangent principal component analysis, we estimate probability models for functional data and apply them to functional analysis of variance, discriminant analysis, and clustering. We demonstrate these tasks using a collection of growth curves from children from ages 1–18.  相似文献   

In this article we study a linear discriminant function of multiple m-variate observations at u-sites and over v-time points under the assumption of multivariate normality. We assume that the m-variate observations have a separable mean vector structure and a “jointly equicorrelated covariance” structure. The new discriminant function is very effective in discriminating individuals in a small sample scenario. No closed-form expression exists for the maximum likelihood estimates of the unknown population parameters, and their direct computation is nontrivial. An iterative algorithm is proposed to calculate the maximum likelihood estimates of these unknown parameters. A discriminant function is also developed for unstructured mean vectors. The new discriminant functions are applied to simulated data sets as well as to a real data set. Results illustrating the benefits of the new classification methods over the traditional one are presented.  相似文献   


Despite the popularity of the general linear mixed model for data analysis, power and sample size methods and software are not generally available for commonly used test statistics and reference distributions. Statisticians resort to simulations with homegrown and uncertified programs or rough approximations which are misaligned with the data analysis. For a wide range of designs with longitudinal and clustering features, we provide accurate power and sample size approximations for inference about fixed effects in the linear models we call reversible. We show that under widely applicable conditions, the general linear mixed-model Wald test has noncentral distributions equivalent to well-studied multivariate tests. In turn, exact and approximate power and sample size results for the multivariate Hotelling–Lawley test provide exact and approximate power and sample size results for the mixed-model Wald test. The calculations are easily computed with a free, open-source product that requires only a web browser to use. Commercial software can be used for a smaller range of reversible models. Simple approximations allow accounting for modest amounts of missing data. A real-world example illustrates the methods. Sample size results are presented for a multicenter study on pregnancy. The proposed study, an extension of a funded project, has clustering within clinic. Exchangeability among the participants allows averaging across them to remove the clustering structure. The resulting simplified design is a single-level longitudinal study. Multivariate methods for power provide an approximate sample size. All proofs and inputs for the example are in the supplementary materials (available online).  相似文献   

The explicit forms of the minimum variance quadratic unbiased estimators (MIVQUEs) of the variance components are given for simple linear regression with onefold nested error. The resulting estimators are more efficient as the ratio of the initial variance components estimates increases and are asymptotically efficient as the ratio tends to infinity.  相似文献   

Despite tremendous effort on different designs with cross-sectional data, little research has been conducted for sample size calculation and power analyses under repeated measures design. In addition to time-averaged difference, changes in mean response over time (CIMROT) is the primary interest in repeated measures analysis. We generalized sample size calculation and power analysis equations for CIMROT to allow unequal sample size between groups for both continuous and binary measures, through simulation, evaluated the performance of proposed methods, and compared our approach to that of a two-stage model formulization. We also created a software procedure to implement the proposed methods.  相似文献   

We develop a sample size methodology that achieves specified Type-1 and Type-2 error rates when comparing the survivor functions of multiple treatment groups versus a control group. The designs will control family-wise Type-1 error rate. We assume the family of Weibull distributions adequately describes the underlying survivor functions, and we separately consider three of the most common study scenarios: (a) complete samples; (b) Type-1 censoring with a common censoring time; and (c) Type-1 censoring with an accrual period. A mice longevity study comparing the effect on survival of multiple low-calorie diets is used to motivate our work on this problem.  相似文献   

在介绍两种生成二次趋势模型的基础上,指明两者具有某种内在的关系,并以隐性趋势模型为数据生成过程,使用显性趋势模型作为估计对象,进行参数估计和相应的假设检验。理论分析结果表明:显性趋势模型的参数、t检验统计量和联合F检验统计量的极限具有非标准的分布,且高度显著;以显性趋势模型为数据生成过程,使用隐性趋势模型作为估计对象,结果表明隐性趋势模型是带趋势项的单位根过程;采用LLR检验统计量对两类模型进行区分检验,使用仿真技术进行模拟,仿真结果支持上述理论分析结论和LLR统计量能够区分两种模型。  相似文献   

In practice a degree of uncertainty will always exist concerning what specification to adopt for the deterministic trend function when running unit root tests. While most macroeconomic time series appear to display an underlying trend, it is often far from clear whether this component is best modeled as a simple linear trend (so that long-run growth rates are constant) or by a more complicated nonlinear trend function which may, for instance, allow the deterministic trend component to evolve gradually over time. In this article, we consider the effects on unit root testing of allowing for a local quadratic trend, a simple yet very flexible example of the latter. Where a local quadratic trend is present but not modeled, we show that the quasi-differenced detrended Dickey–Fuller-type test of Elliott et al. (1996 Elliott , G. , Rothenberg , T. J. , Stock , J. H. ( 1996 ). Efficient tests for an autoregressive unit root . Econometrica 64 : 813836 .[Crossref], [Web of Science ®] [Google Scholar]) has both size and power which tend to zero asymptotically. An extension of the Elliott et al. (1996 Elliott , G. , Rothenberg , T. J. , Stock , J. H. ( 1996 ). Efficient tests for an autoregressive unit root . Econometrica 64 : 813836 .[Crossref], [Web of Science ®] [Google Scholar]) approach to allow for a quadratic trend resolves this problem but is shown to result in large power losses relative to the standard detrended test when no quadratic trend is present. We consequently propose a simple and practical approach to dealing with this form of uncertainty based on a union of rejections-based decision rule whereby the unit root is rejected whenever either of the detrended or quadratic detrended unit root tests rejects. A modification of this basic strategy is also suggested which further improves on the properties of the procedure. An application to relative primary commodity price data highlights the empirical relevance of the methods outlined in this article. A by-product of our analysis is the development of a test for the presence of a quadratic trend which is robust to whether the data admit a unit root.  相似文献   

To shorten the drug lag or the time lag for approval, simultaneous drug development, submission, and approval in the world may be desirable. Recently, multi-regional trials have attracted much attention from sponsors as well as regulatory authorities. Current methods for sample determination are based on the assumption that true treatment effect is uniform across regions. However, unrecognized heterogeneity among patients as ethnic or genetic factor will effect patients’ survival. In this article, we address the issue that the treatment effects with unrecognized heterogeneity that interacts with treatment are among regions to design a multi-regional trial. The log-rank test is employed to deal with the heterogeneous effect size among regions. The test statistic for the overall treatment effect is used to determine the total sample size for a multi-regional trial and the consistent trend is used to rationalize partition for sample size to each region.  相似文献   

The present study investigates the performance of fice discrimination methods for data consisting of a mixture of continuous and binary variables. The methods are Fisher’s linear discrimination, logistic discrimination, quadratic discrimination, a kernal model and an independence model. Six-dimensional data, consisting of three binary and three continuous variables, are simulated according to a location model. The results show an almost identical performance for Fisher’s linear discrimination and logistic discrimination. Only in situations with independently distributed variables the independence model does have a reasonable discriminatory ability for the dimensionality considered. If the log likelihood ratio is non-linear ratio is non-linear with respect to its continuous and binary part, the quadratic discrimination method is substantial better than linear and logistic discrimination, followed by the kernel method. A very good performance is obtained when in every situation the better one of linear and quardratic discrimination is used.  相似文献   

Optimal response-adaptive designs in Phase III clinical trial set up are becoming more and more current interest. In the present article, an optimal response-adaptive design is introduced for more than two treatments at hand. We minimize an objective function subject to more than one inequality constraints. For this purpose, we propose an extensive computer search algorithm. The proposed procedure is illustrated with extensive numerical computation and simulations. Some real data set is used to illustrate the proposed methodology.  相似文献   

We consider a class of test statistics including the Dempster trace criterion in the case of two groups without assuming equal covariance matrices. The test statistics in the class are valid when the dimension is larger than the sample size. We obtain asymptotic distributions of the test statistics in the class and use these distributions to derive the limiting power in each case. We obtain the most powerful test in the class with respect to this limiting power.  相似文献   

Traditionally, sphericity (i.e., independence and homoscedasticity for raw data) is put forward as the condition to be satisfied by the variance–covariance matrix of at least one of the two observation vectors analyzed for correlation, for the unmodified t test of significance to be valid under the Gaussian and constant population mean assumptions. In this article, the author proves that the sphericity condition is too strong and a weaker (i.e., more general) sufficient condition for valid unmodified t testing in correlation analysis is circularity (i.e., independence and homoscedasticity after linear transformation by orthonormal contrasts), to be satisfied by the variance–covariance matrix of one of the two observation vectors. Two other conditions (i.e., compound symmetry for one of the two observation vectors; absence of correlation between the components of one observation vector, combined with a particular pattern of joint heteroscedasticity in the two observation vectors) are also considered and discussed. When both observation vectors possess the same variance–covariance matrix up to a positive multiplicative constant, the circularity condition is shown to be necessary and sufficient. “Observation vectors” may designate partial realizations of temporal or spatial stochastic processes as well as profile vectors of repeated measures. From the proof, it follows that an effective sample size appropriately defined can measure the discrepancy from the more general sufficient condition for valid unmodified t testing in correlation analysis with autocorrelated and heteroscedastic sample data. The proof is complemented by a simulation study. Finally, the differences between the role of the circularity condition in the correlation analysis and its role in the repeated measures ANOVA (i.e., where it was first introduced) are scrutinized, and the link between the circular variance–covariance structure and the centering of observations with respect to the sample mean is emphasized.  相似文献   

Recently, there has been a great interest in the analysis of longitudinal data in which the observation process is related to the longitudinal process. In literature, the observation process was commonly regarded as a recurrent event process. Sometimes some observation duration may occur and this process is referred to as a recurrent episode process. The medical cost related to hospitalization is an example. We propose a conditional modeling approach that takes into account both informative observation process and observation duration. We conducted simulation studies to assess the performance of the method and applied it to a dataset of medical costs.  相似文献   

In this article, we analyze the three-way bootstrap estimate of the variance of the reader-averaged nonparametric area under the receiver operating characteristic (ROC) curve. The setting for this work is medical imaging, and the experimental design involves sampling from three distributions: a set of normal and diseased cases (patients), and a set of readers (doctors). The experiment we consider is fully crossed in that each reader reads each case. A reading generates a score that indicates the reader's level of suspicion that the patient is diseased. The distribution of scores for the normal patients is compared to the distribution of scores for the diseased patients via an ROC curve, and the area under the ROC curve (AUC) summarizes the reader's diagnostic ability to separate the normal patients from the diseased ones. We find that the bootstrap estimate of the variance of the reader-averaged AUC is biased, and we represent this bias in terms of moments of success outcomes. This representation helps unify and improve several current methods for multi-reader multi-case (MRMC) ROC analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号