期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Testing identity of high-dimensional covariance matrix

Hao Wang Baisen Liu Ning-Zhong Shi Shurong Zheng 《Journal of Statistical Computation and Simulation》2018,88(13):2600-2611

Two new statistics are proposed for testing the identity of high-dimensional covariance matrix. Applying the large dimensional random matrix theory, we study the asymptotic distributions of our proposed statistics under the situation that the dimension p and the sample size n tend to infinity proportionally. The proposed tests can accommodate the situation that the data dimension is much larger than the sample size, and the situation that the population distribution is non-Gaussian. The numerical studies demonstrate that the proposed tests have good performance on the empirical powers for a wide range of dimensions and sample sizes. 相似文献

2.

Testing a block exchangeable covariance matrix

A. Roy D. Klein 《Statistics》2018,52(2):393-408

Testing hypotheses about the structure of a covariance matrix for doubly multivariate data is often considered in the literature. In this paper the Rao's score test (RST) is derived to test the block exchangeable covariance matrix or block compound symmetry (BCS) covariance structure under the assumption of multivariate normality. It is shown that the empirical distribution of the RST statistic under the null hypothesis is independent of the true values of the mean and the matrix components of a BCS structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Simulation studies are performed for the sample size consideration, and for the estimation of the empirical quantiles of the null distribution of the test statistic. The RST procedure is illustrated on a real data set from the medical studies. 相似文献

3.

Robust estimation of a high-dimensional integrated covariance matrix

Takayuki Morimoto Shuichi Nagata 《统计学通讯:模拟与计算》2017,46(2):1102-1112

In this article, we consider a robust method of estimating a realized covariance matrix calculated as the sum of cross products of intraday high-frequency returns. According to recent articles in financial econometrics, the realized covariance matrix is essentially contaminated with market microstructure noise. Although techniques for removing noise from the matrix have been studied since the early 2000s, they have primarily investigated a low-dimensional covariance matrix with statistically significant sample sizes. We focus on noise-robust covariance estimation under converse circumstances, that is, a high-dimensional covariance matrix possibly with a small sample size. For the estimation, we utilize a statistical hypothesis test based on the characteristic that the largest eigenvalue of the covariance matrix asymptotically follows a Tracy–Widom distribution. The null hypothesis assumes that log returns are not pure noises. If a sample eigenvalue is larger than the relevant critical value, then we fail to reject the null hypothesis. The simulation results show that the estimator studied here performs better than others as measured by mean squared error. The empirical analysis shows that our proposed estimator can be adopted to forecast future covariance matrices using real data. 相似文献

4.

Rank covariance matrix estimation of a partially known covariance matrix

Kristi Kuljus Dietrich von Rosen 《Journal of statistical planning and inference》2008

Classical multivariate methods are often based on the sample covariance matrix, which is very sensitive to outlying observations. One alternative to the covariance matrix is the affine equivariant rank covariance matrix (RCM) that has been studied in Visuri et al. [2003. Affine equivariant multivariate rank methods. J. Statist. Plann. Inference 114, 161–185]. In this article we assume that the covariance matrix is partially known and study how to estimate the corresponding RCM. We use the properties that the RCM is affine equivariant and that the RCM is proportional to the inverse of the regular covariance matrix, and hence reduce the problem of estimating the original RCM to estimating marginal rank covariance matrices. This is a great computational advantage when the dimension of the original data vector is large. 相似文献

5.

Variable selection for high-dimensional generalized linear model with block-missing data

Yifan He Yang Feng Xinyuan Song 《Scandinavian Journal of Statistics》2023,50(3):1279-1297

In modern scientific research, multiblock missing data emerges with synthesizing information across multiple studies. However, existing imputation methods for handling block-wise missing data either focus on the single-block missing pattern or heavily rely on the model structure. In this study, we propose a single regression-based imputation algorithm for multiblock missing data. First, we conduct a sparse precision matrix estimation based on the structure of block-wise missing data. Second, we impute the missing blocks with their means conditional on the observed blocks. Theoretical results about variable selection and estimation consistency are established in the context of a generalized linear model. Moreover, simulation studies show that compared with existing methods, the proposed imputation procedure is robust to various missing mechanisms because of the good properties of regression imputation. An application to Alzheimer's Disease Neuroimaging Initiative data also confirms the superiority of our proposed method. 相似文献

6.

Tests for high-dimensional covariance matrices using the theory of U-statistics

M. Rauf Ahmad D. von Rosen 《Journal of Statistical Computation and Simulation》2015,85(13):2619-2631

Test statistics for sphericity and identity of the covariance matrix are presented, when the data are multivariate normal and the dimension, p, can exceed the sample size, n. Under certain mild conditions mainly on the traces of the unknown covariance matrix, and using the asymptotic theory of U-statistics, the test statistics are shown to follow an approximate normal distribution for large p, also when p?n. The accuracy of the statistics is shown through simulation results, particularly emphasizing the case when p can be much larger than n. A real data set is used to illustrate the application of the proposed test statistics. 相似文献

7.

Partially pooled covariance matrix estimation in discriminant analysis

Tom Greene William S. Rayens 《统计学通讯:理论与方法》2013,42(10):3679-3702

The Linear Discriminant Rule (LD) is theoretically justified for use in classification when the population within-groups covariance matrices are equal, something rarely known in practice. As an alternative, the Quadratic Discriminant Rule (QD) avoids assuming equal covariance matrices, but requires the estimation of a large number of parameters. Hence, the performance of QD may be poor if the training set sizes are small or moderate. In fact, simulation studies have shown that in the two-groups case LD often outperforms QD for small training sets even when the within -groups covariance matrices differ substantially. The present article shows this to be true when there are more than two groups, as well. Thus, it would seem reasonable and useful to develop a data-based method of classification that, in effect, represents a compromise between QD and LD. In this article we develop such a method based on an empirical Bayes formulation in which the within-groups covariance matrices are assumed to be outcomes of a common prior distribution whose parameters are estimated from the data. Two classification rules are developed under this framework and, through the use of extensive simulations, are compared to existing methods when the number of groups is moderate. 相似文献

8.

Sparse subspace linear discriminant analysis

Yanfang Li Jing Lei 《Statistics》2018,52(4):782-800

We study high dimensional multigroup classification from a sparse subspace estimation perspective, unifying the linear discriminant analysis (LDA) with other recent developments in high dimensional multivariate analysis using similar tools, such as penalization method. We develop two two-stage sparse LDA models, where in the first stage, convex relaxation is used to convert two classical formulations of LDA to semidefinite programs, and furthermore subspace perspective allows for straightforward regularization and estimation. After the initial convex relaxation, we use a refinement stage to improve the accuracy. For the first model, a penalized quadratic program with group lasso penalty is used for refinement, whereas a sparse version of the power method is used for the second model. We carefully examine the theoretical properties of both methods, alongside with simulations and real data analysis. 相似文献

9.

The covariance matrix of normal order statistics

C. S. Davis M. A. Stephens 《统计学通讯:模拟与计算》2013,42(1):75-81

An approximation is given to calculate V, the covariance matrix for normal order statistics. The approximation gives considerable improvement over previous approximations, and the computing algorithm is available from the authors. 相似文献

10.

Vector random fields with compactly supported covariance matrix functions

Juan Du Chunsheng Ma 《Journal of statistical planning and inference》2013

The objective of this paper is to construct covariance matrix functions whose entries are compactly supported, and to use them as building blocks to formulate other covariance matrix functions for second-order vector stochastic processes or random fields. In terms of the scale mixture of compactly supported covariance matrix functions, we derive a class of second-order vector stochastic processes on the real line whose direct and cross covariance functions are of Pólya type. Then some second-order vector random fields in R^d

R^{d}

whose direct and cross covariance functions are compactly supported are constructed by using a convolution approach and a mixture approach. 相似文献

11.

Finite-sample analysis of impacts of unlabeled data and their labeling mechanisms in linear discriminant analysis

Kenichi Hayashi Keiji Takai 《统计学通讯:模拟与计算》2017,46(1):184-203

It is widely believed that unlabeled data are promising for improving prediction accuracy in classification problems. Although theoretical studies about when/how unlabeled data are beneficial exist, an actual prediction improvement has not been sufficiently investigated for a finite sample in a systematic manner. We investigate the impact of unlabeled data in linear discriminant analysis and compare the error rates of the classifiers estimated with/without unlabeled data. Our focus is a labeling mechanism that characterizes the probabilistic structure of occurrence of labeled cases. Results imply that an extremely small proportion of unlabeled data has a large effect on the analysis results. 相似文献

12.

Supervised classifiers for high-dimensional higher-order data with locally doubly exchangeable covariance structure

Tatjana Pavlenko Anuradha Roy 《统计学通讯:理论与方法》2017,46(23):11612-11634

We explore the performance accuracy of the linear and quadratic classifiers for high-dimensional higher-order data, assuming that the class conditional distributions are multivariate normal with locally doubly exchangeable covariance structure. We derive a two-stage procedure for estimating the covariance matrix: at the first stage, the Lasso-based structure learning is applied to sparsifying the block components within the covariance matrix. At the second stage, the maximum-likelihood estimators of all block-wise parameters are derived assuming the doubly exchangeable within block covariance structure and a Kronecker product structured mean vector. We also study the effect of the block size on the classification performance in the high-dimensional setting and derive a class of asymptotically equivalent block structure approximations, in a sense that the choice of the block size is asymptotically negligible. 相似文献

13.

A new heteroskedasticity-consistent covariance matrix estimator and inference under heteroskedasticity

Shunyong Li Nahui Zhang Guannan Wang 《Journal of Statistical Computation and Simulation》2017,87(1):198-210

To solve the heteroscedastic problem in linear regression, many different heteroskedasticity-consistent covariance matrix estimators have been proposed, including HC0 estimator and its variants, such as HC1, HC2, HC3, HC4, HC5 and HC4m. Each variant of the HC0 estimator aims at correcting the tendency of underestimating the true variances. In this paper, a new variant of HC0 estimator, HC5m, which is a combination of HC5 and HC4m, is proposed. Both the numerical analysis and the empirical analysis show that the quasi-t inference based on HC5m is typically more reliable than inferences based on other covariance matrix estimators, regardless of the existence of high leverage points. 相似文献

14.

Small sample behavior of a robust heteroskedasticity consistent covariance matrix estimator

《Journal of Statistical Computation and Simulation》2012,82(1-3):115-128

In heteroskedastic regression models, the least squares (OLS) covariance matrix estimator is inconsistent and inference is not reliable. To deal with inconsistency one can estimate the regression coefficients by OLS, and then implement a heteroskedasticity consistent covariance matrix (HCCM) estimator. Unfortunately the HCCM estimator is biased. The bias is reduced by implementing a robust regression, and by using the robust residuals to compute the HCCM estimator (RHCCM). A Monte-Carlo study analyzes the behavior of RHCCM and of other HCCM estimators, in the presence of systematic and random heteroskedasticity, and of outliers in the explanatory variables. 相似文献

15.

Robust selection of variables in linear discriminant analysis

Valentin Todorov 《Statistical Methods and Applications》2007,15(3):395-407

A commonly used procedure for reduction of the number of variables in linear discriminant analysis is the stepwise method for variable selection. Although often criticized, when used carefully, this method can be a useful prelude to a further analysis. The contribution of a variable to the discriminatory power of the model is usually measured by the maximum likelihood ratio criterion, referred to as Wilks’ lambda. It is well known that the Wilks’ lambda statistic is extremely sensitive to the influence of outliers. In this work a robust version of the Wilks’ lambda statistic will be constructed based on the Minimum Covariance Discriminant (MCD) estimator and its reweighed version which has a higher efficiency. Taking advantage of the availability of a fast algorithm for computing the MCD a simulation study will be done to evaluate the performance of this statistic. The presentation of material in this article does not imply the expression of any opinion whatsoever on the part of Austro Control GmbH and is the sole responsibility of the authors. 相似文献

16.

Lag order selection for an optimal autoregressive covariance matrix estimator

Marco Morales 《Journal of applied statistics》2010,37(5):739-748

A good parametric spectral estimator requires an accurate estimate of the sum of AR coefficients, however a criterion which minimizes the innovation variance not necessarily yields the best spectral estimate. This paper develops an alternative information criterion considering the bias in the sum of the parameters for the autoregressive estimator of the spectral density at frequency zero. 相似文献

17.

On the best equivariant estimator of covariance matrix of a multivariate normal population

Shuanglin Zhang Qiuying Sha 《统计学通讯:理论与方法》2013,42(8):2021-2034

相似文献

18.

Error-rate estimation in two-group discriminant analysis using the linear discriminant function

《Journal of Statistical Computation and Simulation》2012,82(2-3):157-175

Fisher's linear discriminant function, adapted by Anderson for allocating new observations into one of two existing groups, is considered in this paper. Methods of estimating the misclassification error rates are reviewed and evaluated by Monte Carlo simulations. The investigation is carried out under both ideal (Multivariate Normal data) and non-ideal (Multivariate Binary data) conditions. The assessment is based on the usual mean square error (MSE) criterion and also on a new criterion of optimism. The results show that although there is a common cluster of good estimators for both ideal and non-ideal conditions, the single best estimators vary with respect to the different criteria 相似文献

19.

Clustering of longitudinal interval-valued data via mixture distribution under covariance separability

Seongoh Park Johan Lim Hyejeong Choi Minjung Kwak 《Journal of applied statistics》2020,47(10):1739

相似文献

20.

A comparison of regularization methods applied to the linear discriminant function with high-dimensional microarray data

John A. Ramey Phil D. Young 《Journal of Statistical Computation and Simulation》2013,83(3):581-596

Classification of gene expression microarray data is important in the diagnosis of diseases such as cancer, but often the analysis of microarray data presents difficult challenges because the gene expression dimension is typically much larger than the sample size. Consequently, classification methods for microarray data often rely on regularization techniques to stabilize the classifier for improved classification performance. In particular, numerous regularization techniques, such as covariance-matrix regularization, are available, which, in practice, lead to a difficult choice of regularization methods. In this paper, we compare the classification performance of five covariance-matrix regularization methods applied to the linear discriminant function using two simulated high-dimensional data sets and five well-known, high-dimensional microarray data sets. In our simulation study, we found the minimum distance empirical Bayes method reported in Srivastava and Kubokawa [Comparison of discrimination methods for high dimensional data, J. Japan Statist. Soc. 37(1) (2007), pp. 123–134], and the new linear discriminant analysis reported in Thomaz, Kitani, and Gillies [A Maximum Uncertainty LDA-based approach for Limited Sample Size problems – with application to Face Recognition, J. Braz. Comput. Soc. 12(1) (2006), pp. 1–12], to perform consistently well and often outperform three other prominent regularization methods. Finally, we conclude with some recommendations for practitioners. 相似文献