期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Regularized covariance matrix estimation under the common principal components model

P. T. Pepler D. W. Uys D. G. Nel 《统计学通讯:模拟与计算》2018,47(3):631-643

The common principal components (CPC) model provides a way to model the population covariance matrices of several groups by assuming a common eigenvector structure. When appropriate, this model can provide covariance matrix estimators of which the elements have smaller standard errors than when using either the pooled covariance matrix or the per group unbiased sample covariance matrix estimators. In this article, a regularized CPC estimator under the assumption of a common (or partially common) eigenvector structure in the populations is proposed. After estimation of the common eigenvectors using the Flury–Gautschi (or other) algorithm, the off-diagonal elements of the nearly diagonalized covariance matrices are shrunk towards zero and multiplied with the orthogonal common eigenvector matrix to obtain the regularized CPC covariance matrix estimates. The optimal shrinkage intensity per group can be estimated using cross-validation. The efficiency of these estimators compared to the pooled and unbiased estimators is investigated in a Monte Carlo simulation study, and the regularized CPC estimator is applied to a real dataset to demonstrate the utility of the method. 相似文献

2.

A monte carlo study on two methods of calculating the mle's covariance matrix in a seemingly unrelated nonlinear regression. *

Mark J. Jensen 《Econometric Reviews》2013,32(3):315-330

Econometric techniques to estimate output supply systems, factor demand systems and consumer demand systems have often required estimating a nonlinear system of equations that have an additive error structure when written in reduced form. To calculate the ML estimate's covariance matrix of this nonlinear system one can either invert the Hessian of the concentrated log likelihood function, or invert the matrix calculated by pre-multiplying and post multiplying the inverted MLE of the disturbance covariance matrix by the Jacobian of the reduced form model. Malinvaud has shown that the latter of these methods is the actual limiting distribution's covariance matrix, while Barnett has shown that the former is only an approximation.

In this paper, we use a Monte Carlo simulation study to determine how these two covariance matrices differ with respect to the nonlinearity of the model, the number of observations in the dataet, and the residual process. We find that the covariance matrix calculated from the Hessian of the concentrated likelihood function produces Wald statistics that are distributed above those calculated with the other covariance matrix. This difference becomes insignificant as the sample size increases to one-hundred or more observations, suggesting that the asymptotics of the two covariance matrices are quickly reached. 相似文献

3.

A monte carlo study on two methods of calculating the mle's covariance matrix in a seemingly unrelated nonlinear regression.

Mark J. Jensen 《Econometric Reviews》1995,14(3):315-330

Econometric techniques to estimate output supply systems, factor demand systems and consumer demand systems have often required estimating a nonlinear system of equations that have an additive error structure when written in reduced form. To calculate the ML estimate's covariance matrix of this nonlinear system one can either invert the Hessian of the concentrated log likelihood function, or invert the matrix calculated by pre-multiplying and post multiplying the inverted MLE of the disturbance covariance matrix by the Jacobian of the reduced form model. Malinvaud has shown that the latter of these methods is the actual limiting distribution's covariance matrix, while Barnett has shown that the former is only an approximation.

In this paper, we use a Monte Carlo simulation study to determine how these two covariance matrices differ with respect to the nonlinearity of the model, the number of observations in the dataet, and the residual process. We find that the covariance matrix calculated from the Hessian of the concentrated likelihood function produces Wald statistics that are distributed above those calculated with the other covariance matrix. This difference becomes insignificant as the sample size increases to one-hundred or more observations, suggesting that the asymptotics of the two covariance matrices are quickly reached. 相似文献

4.

A Tale of Two Matrix Factorizations

Paul Fogel Douglas M. Hawkins Chris Beecher George Luta S. Stanley Young 《The American statistician》2013,67(4):207-218

In statistical practice, rectangular tables of numeric data are commonplace, and are often analyzed using dimension-reduction methods like the singular value decomposition and its close cousin, principal component analysis (PCA). This analysis produces score and loading matrices representing the rows and the columns of the original table and these matrices may be used for both prediction purposes and to gain structural understanding of the data. In some tables, the data entries are necessarily nonnegative (apart, perhaps, from some small random noise), and so the matrix factors meant to represent them should arguably also contain only nonnegative elements. This thinking, and the desire for parsimony, underlies such techniques as rotating factors in a search for “simple structure.” These attempts to transform score or loading matrices of mixed sign into nonnegative, parsimonious forms are, however, indirect and at best imperfect. The recent development of nonnegative matrix factorization, or NMF, is an attractive alternative. Rather than attempt to transform a loading or score matrix of mixed signs into one with only nonnegative elements, it directly seeks matrix factors containing only nonnegative elements. The resulting factorization often leads to substantial improvements in interpretability of the factors. We illustrate this potential by synthetic examples and a real dataset. The question of exactly when NMF is effective is not fully resolved, but some indicators of its domain of success are given. It is pointed out that the NMF factors can be used in much the same way as those coming from PCA for such tasks as ordination, clustering, and prediction. Supplementary materials for this article are available online. 相似文献

5.

Rank covariance matrix estimation of a partially known covariance matrix

Kristi Kuljus Dietrich von Rosen 《Journal of statistical planning and inference》2008

Classical multivariate methods are often based on the sample covariance matrix, which is very sensitive to outlying observations. One alternative to the covariance matrix is the affine equivariant rank covariance matrix (RCM) that has been studied in Visuri et al. [2003. Affine equivariant multivariate rank methods. J. Statist. Plann. Inference 114, 161–185]. In this article we assume that the covariance matrix is partially known and study how to estimate the corresponding RCM. We use the properties that the RCM is affine equivariant and that the RCM is proportional to the inverse of the regular covariance matrix, and hence reduce the problem of estimating the original RCM to estimating marginal rank covariance matrices. This is a great computational advantage when the dimension of the original data vector is large. 相似文献

6.

Pivotal variable detection of the covariance matrix and its application to high-dimensional factor models

Junlong Zhao Hongyu Zhao Lixing Zhu 《Statistics and Computing》2018,28(4):775-793

To estimate the high-dimensional covariance matrix, row sparsity is often assumed such that each row has a small number of nonzero elements. However, in some applications, such as factor modeling, there may be many nonzero loadings of the common factors. The corresponding variables are also correlated to one another and the rows are non-sparse or dense. This paper has three main aims. First, a detection method is proposed to identify the rows that may be non-sparse, or at least dense with many nonzero elements. These rows are called dense rows and the corresponding variables are called pivotal variables. Second, to determine the number of rows, a ridge ratio method is suggested, which can be regarded as a sure screening procedure. Third, to handle the estimation of high-dimensional factor models, a two-step procedure is suggested with the above screening as the first step. Simulations are conducted to examine the performance of the new method and a real dataset is analyzed for illustration. 相似文献

7.

An error-free generalized matrix inversion and linear least squares method based on bordering

Sallie Keller-McNulty W. J. ennedy 《统计学通讯:模拟与计算》2013,42(3):769-785

The paper develops a method from which algorithms can be constructed to numerically compute error-free (free from computer roundoff error) generalized inverses and solutions to linear least squares problems having rational entries. A multiple modulus system is used to avoid error accumulation that is inherent in the floating-point number system. Some properties of finite fields of characteristic p, GF(p), are used in conjunction with a bordering method for matrix inversion to find nonsingular minors of a matrix over the field of rational numbers. 相似文献

8.

A monte carlo comparison of the smoothing,scoring and em algorithms for dispersion matrix estimation with incomplete growth curve data

《Journal of Statistical Computation and Simulation》2012,82(1-2):77-92

Incomplete growth curve data often result from missing or mistimed observations in a repeated measures design. Virtually all methods of analysis rely on the dispersion matrix estimates. A Monte Carlo simulation was used to compare three methods of estimation of dispersion matrices for incomplete growth curve data. The three methods were: 1) maximum likelihood estimation with a smoothing algorithm, which finds the closest positive semidefinite estimate of the pairwise estimated dispersion matrix; 2) a mixed effects model using the EM (estimation maximization) algorithm; and 3) a mixed effects model with the scoring algorithm. The simulation included 5 dispersion structures, 20 or 40 subjects with 4 or 8 observations per subject and 10 or 30% missing data. In all the simulations, the smoothing algorithm was the poorest estimator of the dispersion matrix. In most cases, there were no significant differences between the scoring and EM algorithms. The EM algorithm tended to be better than the scoring algorithm when the variances of the random effects were close to zero, especially for the simulations with 4 observations per subject and two random effects. 相似文献

9.

Further theoretical results and a comparison between two methods for approximating eigenvalues of perturbed covariance matrices

Ali S. Hadi Hans Nyquist 《Statistics and Computing》1993,3(3):113-123

Covariance matrices, or in general matrices of sums of squares and cross-products, are used as input in many multivariate analyses techniques. The eigenvalues of these matrices play an important role in the statistical analysis of data including estimation and hypotheses testing. It has been recognized that one or few observations can exert an undue influence on the eigenvalues of a covariance matrix. The relationship between the eigenvalues of the covariance matrix computed from all data and the eigenvalues of the perturbed covariance matrix (a covariance matrix computed after a small subset of the observations has been deleted) cannot in general be written in closed-form. Two methods for approximating the eigenvalues of a perturbed covariance matrix have been suggested by Hadi (1988) and Wang and Nyquist (1991) for the case of a perturbation by a single observation. In this paper we improve on these two methods and give some additional theoretical results that may give further insight into the problem. We also compare the two improved approximations in terms of their accuracies. 相似文献

10.

A Note on the Factorization of a Matrix Inverse

Yihao Deng 《统计学通讯:模拟与计算》2013,42(8):1122-1129

The decomposition of a matrix as a product of a lower triangular with ones on the diagonal and an upper triangular matrix is useful for solving systems of linear equations. For a given non singular matrix, this type of decomposition is unique and algorithms exist to obtain the two factors. However, in certain problems the factorization of the inverse matrix may be of interest. This note presents an algorithm for factoring the inverse matrix using simple operations of elements from the original matrix. As examples, we give factorizations for several well-known and widely used correlation matrices. The usefulness and practicality of these factorizations are provided in an application of statistical modeling using unbiased estimating equations. 相似文献

11.

On estimating the covariance matrix of robust regression M-estimates

Richard W. Hill 《统计学通讯:理论与方法》2013,42(12):1183-1196

By analyzing a special class of regression problems we point out that previously suggested estimates of the covariance matrix of regression M-cstimatos are inadequate for certain design matrices. These results confirm the conclusions drawn in several Monte Cailo studies. 相似文献

12.

Comparison of Four New General Classes of Search Designs 总被引：1，自引：0，他引：1

Subir Ghosh & Colleen Burns 《Australian & New Zealand Journal of Statistics》2002,44(3):357-366

A factor screening experiment identifies a few important factors from a large list of factors that potentially influence the response. If a list consists of m factors each at three levels, a design is a subset of all possible 3^m runs. This paper considers the problem of finding designs with small numbers of runs, using the search linear model introduced in Srivastava (1975). The paper presents four new general classes of these 'search designs', each with 2 m −1 runs, which permit, at most, two important factors out of m factors to be searched for and identified. The paper compares the designs for 4 ≤ m ≤ 10, using arithmetic and geometric means of the determinants, traces and maximum characteristic roots of particular matrices. Two of the designs are found to be superior in all six criteria studied. The four designs are identical for m = 3 and this design is an optimal design in the class of all search designs under the six criteria. The four designs are also identical for m = 4 under some row and column permutations. 相似文献

13.

Multivariate analysis for the two-period repeated measures crossover design with application to clinical trials

H.I. Patel Erwin M. Hearne III 《统计学通讯:理论与方法》2013,42(18):1919-1929

Multivariate analysis techniques are applied to the two-period repeated measures crossover design. The approach considered in this paper has the advantage over the univariate analysis approach proposed recently by Wallenstein and Fisher (1977) that the former does not require any specific structure on the variance-covariance matrix of the repeated measures factor. (It should be noted that sums and differences of observations over periods are used for all tests. Therefore, there are two matrices under consideration, one for sums and one for differences.) Tests of significance are derived using the Wilks? criterion, and the procedure is illustrated with a numerical example from the area of clinical trials. 相似文献

14.

Distribution of multivariate quadratic forms under certain covariance structures

Robert J. Pavur 《Revue canadienne de statistique》1987,15(2):169-176

Necessary and sufficient conditions are given for the covariance structure of all the observations in a multivariate factorial experiment under which certain multivariate quadratic forms are independent and distributed as a constant times a Wishart. It is also shown that exact multivariate test statistics can be formed for certain covariance structures of the observations when the assumption of equal covariance matrices for each normal population is relaxed. A characterization is given for the dependency structure between random vectors in which the sample mean and sample covariance matrix have certain properties. 相似文献

15.

A permutation procedure for testing the equality of pattern hypotheses across groups involving correlation or covariance matrices

Shipley Bill 《Statistics and Computing》2000,10(3):253-257

This paper describes a permutation procedure to test for the equality of selected elements of a covariance or correlation matrix across groups. It involves either centring or standardising each variable within each group before randomly permuting observations between groups. Since the assumption of exchangeability of observations between groups does not strictly hold following such transformations, Monte Carlo simulations were used to compare expected and empirical rejection levels as a function of group size, the number of groups and distribution type (Normal, mixtures of Normals and Gamma with various values of the shape parameter). The Monte Carlo study showed that the estimated probability levels are close to those that would be obtained with an exact test except at very small sample sizes (5 or 10 observations per group). The test appears robust against non-normal data, different numbers of groups or variables per group and unequal sample sizes per group. Power was increased with increasing sample size, effect size and the number of elements in the matrix and power was decreased with increasingly unequal numbers of observations per group. 相似文献

16.

A hierarchical model for ordinal matrix factorization

Ulrich Paquet Blaise Thomson Ole Winther 《Statistics and Computing》2012,22(4):945-957

This paper proposes a hierarchical probabilistic model for ordinal matrix factorization. Unlike previous approaches, we model the ordinal nature of the data and take a principled approach to incorporating priors for the hidden variables. Two algorithms are presented for inference, one based on Gibbs sampling and one based on variational Bayes. Importantly, these algorithms may be implemented in the factorization of very large matrices with missing entries. 相似文献

17.

E-optimal designs for regression models with quantitative factors— a reasonable choice?

Holger Dette 《Revue canadienne de statistique》1997,25(4):531-543

For regression models with quantitative factors it is illustrated that the E-optimal design can be extremely inefficient in the sense that it degenerates to a design which takes all observations at only one point. This phenomenon is caused by the different size of the elements in the covariance matrix of the least-squares estimator for the unknown parameters. For these reasons we propose to replace the E-criterion by a corresponding standardized version. The advantage of this approach is demonstrated for the polynomial regression on a nonnegative interval, where the classical and standardized E-optimal designs can be found explicitly. The described phenomena are not restricted to the E-criterion but appear for nearly all optimality criteria proposed in the literature. Therefore standardization is recommended for optimal experimental design in regression models with quantitative factors. The optimal designs with respect to the new standardized criteria satisfy a similar invariance property as the famous D-optimal designs, which allows an easy calculation of standardized optimal designs on many linearly transformed design spaces. 相似文献

18.

Residuals in the Extended Growth Curve Model

JEMILA SEID HAMID DIETRICH VON ROSEN 《Scandinavian Journal of Statistics》2006,33(1):121-138

Abstract. The Extended Growth Curve model is considered. It turns out that the estimated mean of the model is the projection of the observations on the space generated by the design matrices which turns out to be the sum of two tensor product spaces. The orthogonal complement of this space is decomposed into four orthogonal spaces and residuals are defined by projecting the observation matrix on the resulting components. The residuals are interpreted and some remarks are given as to why we should not use ordinary residuals, what kind of information our residuals give and how this information might be used to validate model assumptions and detect outliers and influential observations. It is shown that the residuals are symmetrically distributed around zero and are uncorrelated with each other. The covariance between the residuals and the estimated model as well as the dispersion matrices for the residuals are also given. 相似文献

19.

Useful matrix transformations for panel data analysis: a survey

B. H. Baltagi 《Statistical Papers》1993,34(1):281-301

This paper surveys some useful matrix transformations which simplify the derivation of GLS as WLS in an error component model. This is particularly important for large panel data applications where brute force inversion of large data matrices may not be feasible. This WLS transformation is known in the literature as the Fuller and Baltese (1974) transformation and its extension to error component models with heteroscedasticity, serial correlation, unbalancedness as well as a set of seemingly unrelated regressions are considered. 相似文献

20.

HIGH DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS

Fan J Liao Y Mincheva M 《Annals of statistics》2011,39(6):3320-3356

The variance covariance matrix plays a central role in the inferential theories of high dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu (2011), taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied. 相似文献