首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Takemura and Sheena [A. Takemura, Y. Sheena, Distribution of eigenvalues and eigenvectors of Wishart matrix when the population eigenvalues are infinitely dispersed and its application to minimax estimation of covariance matrix, J. Multivariate Anal. 94 (2005) 271–299] derived the asymptotic joint distribution of the eigenvalues and the eigenvectors of a Wishart matrix when the population eigenvalues become infinitely dispersed. They also showed necessary conditions for an estimator of the population covariance matrix to be tail minimax for typical loss functions by calculating the asymptotic risk of the estimator. In this paper, we further examine those distributions and risks by means of an asymptotic expansion. We obtain the asymptotic expansion of the distribution function of relevant elements of the sample eigenvalues and eigenvectors. We also derive the asymptotic expansion of the risk function of a scale and orthogonally equivariant estimator with respect to Stein’s loss. As an application, we prove non-minimaxity of Stein’s and Haff’s estimators, which has been an open problem for a long time.  相似文献   

2.
Lee S  Zou F  Wright FA 《Annals of statistics》2010,38(6):3605-3629
A number of settings arise in which it is of interest to predict Principal Component (PC) scores for new observations using data from an initial sample. In this paper, we demonstrate that naive approaches to PC score prediction can be substantially biased towards 0 in the analysis of large matrices. This phenomenon is largely related to known inconsistency results for sample eigenvalues and eigenvectors as both dimensions of the matrix increase. For the spiked eigenvalue model for random matrices, we expand the generality of these results, and propose bias-adjusted PC score prediction. In addition, we compute the asymptotic correlation coefficient between PC scores from sample and population eigenvectors. Simulation and real data examples from the genetics literature show the improved bias and numerical properties of our estimators.  相似文献   

3.
In estimating the eigenvalues of the covariance matrix of a multivariate normal population, the usual estimates are the eigenvalues of the sample covariance matrix. It is well known that these estimates are biased. This paper investigates obtaining improved eigenvalue estimates through improved estimates of the characteristic polynomial, which is a function of the sample eigenvalues. A numerical study investigates the improvements evaluated under both a square error and an entropy loss function.  相似文献   

4.
Influence functions are derived for covariance structure analysis with equality constraints, where the parameters are estimated by minimizing a discrepancy function between the assumed covariance matrix and the sample covariance matrix. As a special case maximum likelihood exploratory factor analysis is studied precisely with a numerical example. Comparison is made with the the results of Tanaka and Odaka (1989), who have proposed a sensitivity analysis procedure in maximum likelihood exploratory factor analysis using the perturbation expansion of a certain function of eigenvalues and eigenvectors of a real symmetric matrix. Also the present paper gives a generalization of Tanaka, Watadani and Moon (1991) to the case with equality constraints.  相似文献   

5.
The condition of fixed sample size is essential for the existence of a Sen-Yates-Grundy form variance and its design unbiased estimator, in the problem of estimating the mean, variance and covariance of a finite population.  相似文献   

6.
Principal component analysis is a popular dimension reduction technique often used to visualize high‐dimensional data structures. In genomics, this can involve millions of variables, but only tens to hundreds of observations. Theoretically, such extreme high dimensionality will cause biased or inconsistent eigenvector estimates, but in practice, the principal component scores are used for visualization with great success. In this paper, we explore when and why the classical principal component scores can be used to visualize structures in high‐dimensional data, even when there are few observations compared with the number of variables. Our argument is twofold: First, we argue that eigenvectors related to pervasive signals will have eigenvalues scaling linearly with the number of variables. Second, we prove that for linearly increasing eigenvalues, the sample component scores will be scaled and rotated versions of the population scores, asymptotically. Thus, the visual information of the sample scores will be unchanged, even though the sample eigenvectors are biased. In the case of pervasive signals, the principal component scores can be used to visualize the population structures, even in extreme high‐dimensional situations.  相似文献   

7.
Multivariate mixtures of normals with unknown number of components   总被引:2,自引:0,他引:2  
We present full Bayesian analysis of finite mixtures of multivariate normals with unknown number of components. We adopt reversible jump Markov chain Monte Carlo and we construct, in a manner similar to that of Richardson and Green (1997), split and merge moves that produce good mixing of the Markov chains. The split moves are constructed on the space of eigenvectors and eigenvalues of the current covariance matrix so that the proposed covariance matrices are positive definite. Our proposed methodology has applications in classification and discrimination as well as heterogeneity modelling. We test our algorithm with real and simulated data.  相似文献   

8.
In this note we propose two procedures for testing homogeneity of co-variance matrices that are both extensions of Hartley's (1940) test for equality of variances. The first is a two-stage procedure where the first step is a simple test for equality of the largest eigenvalues, and corresponding eigenvectors, of the covariance matrices. The second is based on projection pursuit and seems harder to apply in practice.  相似文献   

9.
Double arrays of n rows and p columns can be regarded as n drawings from some p-dimensional population. A sequence of such arrays is considered. Principal component analysis for each array forms sequences of sample principal components and eigenvalues. The continuity of these sequences, in the sense of convergence with probability one and convergence in probability, is investigated, that appears to be informative for pattern study and prediction of principal components. Various features of paths of sequences of population principal components are highlighted through an example.  相似文献   

10.
This paper proposes a selection procedure to estimate the multiplicity of the smallest eigenvalue of the covariance matrix. The unknown number of signals present in a radar data can be formulated as the difference between the total number of components in the observed multivariate data vector and the multiplicity of the smallest eigenvalue. In the observed multivariate data, the smallest eigenvalues of the sample covariance matrix may in fact be grouped about some nominal value, as opposed to being identically equal. We propose a selection procedure to estimate the multiplicity of the common smallest eigenvalue, which is significantly smaller than the other eigenvalues. We derive the probability of a correct selection, P(CS), and the least favorable configuration (LFC) for our procedures. Under the LFC, the P(CS) attains its minimum over the preference zone of all eigenvalues. Therefore, a minimum sample size can be determined from the P(CS) under the LFC, P(CS|LFC), in order to implement our new procedure with a guaranteed probability requirement. Numerical examples are presented in order to illustrate our proposed procedure.  相似文献   

11.
Wilks’ ratio statistic can be defined in terms of the ratio of the sample generalized variances of two non-independent estimators of the same covariance matrix. Recently this statistic has been proposed as a control statistic for monitoring changes in the covariance matrix of a multivariate normal process in a Phase II situation, particularly when the dimension is larger than the sample size. In this article we derive a technique for decomposing Wilks’ ratio statistic into the product of independent factors that can be associated with the components of the covariance matrix. With these results, we demonstrate that, when a signal is detected in a control procedure for the Phase II monitoring of process variability using the ratio statistic, the signaling value can be decomposed and the process variables contributing to the signal can be specifically identified.  相似文献   

12.
The common principal components (CPC) model provides a way to model the population covariance matrices of several groups by assuming a common eigenvector structure. When appropriate, this model can provide covariance matrix estimators of which the elements have smaller standard errors than when using either the pooled covariance matrix or the per group unbiased sample covariance matrix estimators. In this article, a regularized CPC estimator under the assumption of a common (or partially common) eigenvector structure in the populations is proposed. After estimation of the common eigenvectors using the Flury–Gautschi (or other) algorithm, the off-diagonal elements of the nearly diagonalized covariance matrices are shrunk towards zero and multiplied with the orthogonal common eigenvector matrix to obtain the regularized CPC covariance matrix estimates. The optimal shrinkage intensity per group can be estimated using cross-validation. The efficiency of these estimators compared to the pooled and unbiased estimators is investigated in a Monte Carlo simulation study, and the regularized CPC estimator is applied to a real dataset to demonstrate the utility of the method.  相似文献   

13.
In this article, we present a straightforward Bonferroni approach for determining sample size for estimating the mean vector of a multivariate population under two scenarios: (1) a pre-specified overall confidence level is desired; and (2) a pre-specified confidence level needs to be guaranteed for each individual variable. It is demonstrated that correlation between variables helps reduce the sample size. The formula to calculate the reduced sample size is derived. A binormal example is presented to illustrate the effect of correlation on sample size reduction for various values of the correlation coefficient.  相似文献   

14.
巩红禹  陈雅 《统计研究》2018,35(12):113-122
本文主要讨论样本代表性的改进和多目标调查两个问题。一,本文提出了一种新的改进样本代表性多目标抽样方法,增加样本量与调整样本结构相结合的方法-追加样本的平衡设计,即通过追加样本,使得补充的样本与原来的样本组合生成新的平衡样本,相对于初始样本,减少样本与总体的结构性偏差。平衡样本是指辅助变量总量的霍维茨汤普森估计量等于总体总量真值。二,平衡样本通过选择与多个目标参数相关的辅助变量,使得一套样本对不同的目标参数而言都具有良好的代表性,进而完成多目标调查。结合2010年第六次人口分县普查数据,通过选择多个目标参数,对追加样本后的平衡样本作事后评估结果表明,追加平衡设计能够有效改进样本结构,使得样本结构与总体结构相近,降低目标估计的误差;同时也说明平衡抽样设计能够实现多目标调查,提高样本的使用效率。  相似文献   

15.
The limiting distributions of jackknife statistics for eigenvalues of a sample covariance matrix are derived under the nonnormal situations. Also the numerical examples are given under normal and nonnormal populations.  相似文献   

16.
This article enlarges the covariance configurations, on which the classical linear discriminant analysis is based, by considering the four models arising from the spectral decomposition when eigenvalues and/or eigenvectors matrices are allowed to vary or not between groups. As in the classical approach, the assessment of these configurations is accomplished via a test on the training set. The discrimination rule is then built upon the configuration provided by the test, considering or not the unlabeled data. Numerical experiments, on simulated and real data, have been performed to evaluate the gain of our proposal with respect to the linear discriminant analysis.  相似文献   

17.
In this paper we study the sampling properties of a test statistic which has important applications in the area of linear stochastic control systems with multi-inputs and multi-outputs. The statistic is the ratio of a partial sum of the eigenvalues of a sample covariance matrix and its trace. It turns out that using a method due to Sugiura we may derive a useful approximation for its distribution up to and including terms of order l/n, where n denotes the appropriate size. Numerical illustrations using real data are given.  相似文献   

18.
Summary.  Although the covariance matrices corresponding to different populations are unlikely to be exactly equal they can still exhibit a high degree of similarity. For example, some pairs of variables may be positively correlated across most groups, whereas the correlation between other pairs may be consistently negative. In such cases much of the similarity across covariance matrices can be described by similarities in their principal axes, which are the axes that are defined by the eigenvectors of the covariance matrices. Estimating the degree of across-population eigenvector heterogeneity can be helpful for a variety of estimation tasks. For example, eigenvector matrices can be pooled to form a central set of principal axes and, to the extent that the axes are similar, covariance estimates for populations having small sample sizes can be stabilized by shrinking their principal axes towards the across-population centre. To this end, the paper develops a hierarchical model and estimation procedure for pooling principal axes across several populations. The model for the across-group heterogeneity is based on a matrix-valued antipodally symmetric Bingham distribution that can flexibly describe notions of 'centre' and 'spread' for a population of orthogonal matrices.  相似文献   

19.
Sample covariance matrices play a central role in numerous popular statistical methodologies, for example principal components analysis, Kalman filtering and independent component analysis. However, modern random matrix theory indicates that, when the dimension of a random vector is not negligible with respect to the sample size, the sample covariance matrix demonstrates significant deviations from the underlying population covariance matrix. There is an urgent need to develop new estimation tools in such cases with high‐dimensional data to recover the characteristics of the population covariance matrix from the observed sample covariance matrix. We propose a novel solution to this problem based on the method of moments. When the parametric dimension of the population spectrum is finite and known, we prove that the proposed estimator is strongly consistent and asymptotically Gaussian. Otherwise, we combine the first estimation method with a cross‐validation procedure to select the unknown model dimension. Simulation experiments demonstrate the consistency of the proposed procedure. We also indicate possible extensions of the proposed estimator to the case where the population spectrum has a density.  相似文献   

20.
Allocation of samples in stratified and/or multistage sampling is one of the central issues of sampling theory. In a survey of a population often the constraints for precision of estimators of subpopulations parameters have to be taken care of during the allocation of the sample. Such issues are often solved with mathematical programming procedures. In many situations it is desirable to allocate the sample, in a way which forces the precision of estimates at the subpopulations level to be both: optimal and identical, while the constraints of the total (expected) size of the sample (or samples, in two-stage sampling) are imposed. Here our main concern is related to two-stage sampling schemes. We show that such problem in a wide class of sampling plans has an elegant mathematical and computational solution. This is done due to a suitable definition of the optimization problem, which enables to solve it through a linear algebra setting involving eigenvalues and eigenvectors of matrices defined in terms of some population quantities. As a final result, we obtain a very simple and relatively universal method for calculating the subpopulation optimal and equal-precision allocation which is based on one of the most standard algorithms of linear algebra (available, e.g., in R software). Theoretical solutions are illustrated through a numerical example based on the Labour Force Survey. Finally, we would like to stress that the method we describe allows to accommodate quite automatically for different levels of precision priority for subpopulations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号