共查询到20条相似文献,搜索用时 31 毫秒
1.
The idea of searching for orthogonal projections, from a multidimensional space into a linear subspace, as an aid to detecting non-linear structure has been named exploratory projection pursuit.Most approaches are tied to the idea of searching for interesting projections. Typically, an interesting projection is one where the distribution of the projected data differs from the normal distribution. In this paper we define two projection indices which are aimed specifically at finding projections that best show grouped structure in the plane, if this exists in the multi-dimensional space. These involve a numerical optimization problem which is tackled in two stages, the projection and the pursuit; the first is based on a procedure to generate pseudo-random rotation matrices in the sense of the grand tour by D. Asimov (1985), and the second is a local numerical optimization procedure. One artificial and one real example illustrate the performance of the suggested indices. 相似文献
2.
The most common techniques for graphically presenting a multivariate dataset involve projection onto a one or two-dimensional subspace. Interpretation of such plots is not always straightforward because projections are smoothing operations in that structure can be obscured by projection but never enhanced. In this paper an alternative procedure for finding interesting features is proposed that is based on locating the modes of an induced hyperspherical density function, and a simple algorithm for this purpose is developed. Emphasis is placed on identifying the non-linear effects, such as clustering, so to this end the data are firstly sphered to remove all of the location, scale and correlational structure. A set of simulated bivariate data and artistic qualities of painters data are used as examples. 相似文献
3.
《Journal of statistical planning and inference》1998,67(2):227-245
In this paper, a notion of generalized inner product spaces is introduced to study optimal estimating functions. The basic technique involves an idea of orthogonal projection first introduced by Small and McLeish (1988, 1989, 1991, 1992, 1994). A characterization of orthogonal projections in generalized inner product spaces is given. It is shown that the orthogonal projection of the score function into a linear subspace of estimating functions is optimal in that subspace, and a characterization of optimal estimating functions is given. As special cases of the main results of this paper, we derive the results of Godambe (1985) on the foundation of estimation in stochastic processes, the result of Godambe and Thompson (1989) on the extension of quasi-likelihood, and the generalized estimating equations for multivariate data due to Liang and Zeger (1986). Also we have derived optimal estimating functions in the Bayesian framework. 相似文献
4.
Henri Caussinus 《Statistical Methods and Applications》1992,1(1):51-65
Summary Several techniques for exploring ann×p data set are considered in the light of the statistical framework: data-structure+noise. The first application is to Principal
Component Analysis (PCA), in fact generalized PCA with any metric M on the unit space ℝ
p
. A natural model for supporting this analysis is the fixed-effect model where the expectation of each unit is assumed to
belong to some q-dimensional linear manyfold defining the structure, while the variance describes the noise. The best estimation
of the structure is obtained for a proper choice of metric M and dimensionality q: guidelines are provided for both choices
in section 2. The second application is to Projection Pursuit which aims to reveal structure in the original data by means
of suitable low-dimensional projections of them. We suggest the use of generalized PCA with suitable metric M as a Projection
Pursuit technique. According to the kind of structure which is looked for, two such metrics are proposed in section 3. Finally,
the analysis ofn×p contingency tables is considered in section 4. Since the data are frequencies, we assume a multinomial or Poisson model for
the noise. Several models may be considered for the structural part; we can say that Correspondence Analysis rests on one
of them, spherical factor analysis on another one; Goodman association models also provide an alternative modelling. These
different approaches are discussed and compared from several points of view. 相似文献
5.
Summary: In this paper the projection approach of Runger (1996) is applied to construct
control charts for a multivariate process. It is assumed that a shift in the mean might
only occur in a known subspace of the parameter space. The projection method permits
a reduction of the dimensionality of the control problem.Several control schemes based on projections are introduced. We consider CUSUM
type charts as well as EWMA schemes. The underlying variables are assumed to be
independent and normally distributed. Using the average run length all control charts
are compared with each other. Moreover, it is analyzed how sensitive the charts react on
a false choice of the subspace. 相似文献
6.
We study high dimensional multigroup classification from a sparse subspace estimation perspective, unifying the linear discriminant analysis (LDA) with other recent developments in high dimensional multivariate analysis using similar tools, such as penalization method. We develop two two-stage sparse LDA models, where in the first stage, convex relaxation is used to convert two classical formulations of LDA to semidefinite programs, and furthermore subspace perspective allows for straightforward regularization and estimation. After the initial convex relaxation, we use a refinement stage to improve the accuracy. For the first model, a penalized quadratic program with group lasso penalty is used for refinement, whereas a sparse version of the power method is used for the second model. We carefully examine the theoretical properties of both methods, alongside with simulations and real data analysis. 相似文献
7.
We propose a new method for dimension reduction in regression using the first two inverse moments. We develop corresponding weighted chi-squared tests for the dimension of the regression. The proposed method considers linear combinations of sliced inverse regression (SIR) and the method using a new candidate matrix which is designed to recover the entire inverse second moment subspace. The optimal combination may be selected based on the p-values derived from the dimension tests. Theoretically, the proposed method, as well as sliced average variance estimate (SAVE), is more capable of recovering the complete central dimension reduction subspace than SIR and principle Hessian directions (pHd). Therefore it can substitute for SIR, pHd, SAVE, or any linear combination of them at a theoretical level. Simulation study indicates that the proposed method may have consistently greater power than SIR, pHd, and SAVE. 相似文献
8.
Problems involving high-dimensional data, such as pattern recognition, image analysis, and gene clustering, often require
a preliminary step of dimension reduction before or during statistical analysis. If one restricts to a linear technique for
dimension reduction, the remaining issue is the choice of the projection. This choice can be dictated by desire to maximize
certain statistical criteria, including variance, kurtosis, sparseness, and entropy, of the projected data. Motivations for
such criteria comes from past empirical studies of statistics of natural and urban images. We present a geometric framework
for finding projections that are optimal for obtaining certain desired statistical properties. Our approach is to define an
objective function on spaces of orthogonal linear projections—Stiefel and Grassmann manifolds, and to use gradient techniques
to optimize that function. This construction uses the geometries of these manifolds to perform the optimization. Experimental
results are presented to demonstrate these ideas for natural and facial images. 相似文献
9.
Probabilistic Principal Component Analysis 总被引:2,自引:0,他引:2
Michael E. Tipping & Christopher M. Bishop 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1999,61(3):611-622
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based on a probability model. We demonstrate how the principal axes of a set of observed data vectors may be determined through maximum likelihood estimation of parameters in a latent variable model that is closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA. 相似文献
10.
We compare the partial least squares (PLS) and the principal component analysis (PCA), in a general case in which the existence of a true linear regression is not assumed. We prove under mild conditions that PLS and PCA are equivalent, to within a first-order approximation, hence providing a theoretical explanation for empirical findings reported by other researchers. Next, we assume the existence of a true linear regression equation and obtain asymptotic formulas for the bias and variance of the PLS parameter estimator 相似文献
11.
Charles‐Elie Rabier Brigitte Mangin Simona Grusea 《Scandinavian Journal of Statistics》2019,46(1):289-313
Genomic selection is today a hot topic in genetics. It consists in predicting breeding values of selection candidates, using the large number of genetic markers now available owing to the recent progress in molecular biology. One of the most popular methods chosen by geneticists is ridge regression. We focus on some predictive aspects of ridge regression and present theoretical results regarding the accuracy criteria, that is, the correlation between predicted value and true value. We show the influence of singular values, the regularization parameter, and the projection of the signal on the space spanned by the rows of the design matrix. Asymptotic results in a high‐dimensional framework are given; in particular, we prove that the convergence to optimal accuracy highly depends on a weighted projection of the signal on each subspace. We discuss on how to improve the prediction. Last, illustrations on simulated and real data are proposed. 相似文献
12.
Not only are copula functions joint distribution functions in their own right, they also provide a link between multivariate distributions and their lower‐dimensional marginal distributions. Copulas have a structure that allows us to characterize all possible multivariate distributions, and therefore they have the potential to be a very useful statistical tool. Although copulas can be traced back to 1959, there is still much scope for new results, as most of the early work was theoretical rather than practical. We focus on simple practical tools based on conditional expectation, because such tools are not widely available. When dealing with data sets in which the dependence throughout the sample is variable, we suggest that copula‐based regression curves may be more accurate predictors of specific outcomes than linear models. We derive simple conditional expectation formulae in terms of copulas and apply them to a combination of simulated and real data. 相似文献
13.
H. Fotouhi 《Journal of applied statistics》2012,39(10):2199-2207
Most of the linear statistics deal with data lying in a Euclidean space. However, there are many examples, such as DNA molecule topological structures, in which the initial or the transformed data lie in a non-Euclidean space. To get a measure of variability in these situations, the principal component analysis (PCA) is usually performed on a Euclidean tangent space as it cannot be directly implemented on a non-Euclidean space. Instead, principal geodesic analysis (PGA) is a new tool that provides a measure of variability for nonlinear statistics. In this paper, the performance of this new tool is compared with that of the PCA using a real data set representing a DNA molecular structure. It is shown that due to the nonlinearity of space, the PGA explains more variability of the data than the PCA. 相似文献
14.
Simple principal components 总被引:3,自引:0,他引:3
S. K. Vines 《Journal of the Royal Statistical Society. Series C, Applied statistics》2000,49(4):441-451
We introduce an algorithm for producing simple approximate principal components directly from a variance–covariance matrix. At the heart of the algorithm is a series of 'simplicity preserving' linear transformations. Each transformation seeks a direction within a two-dimensional subspace that has maximum variance. However, the choice of directions is limited so that the direction can be represented by a vector of integers whenever the subspace can also be represented by vector if integers. The resulting approximate components can therefore always be represented by integers. Furthermore the elements of these integer vectors are often small, particularly for the first few components. We demonstrate the performance of this algorithm on two data sets and show that good approximations to the principal components that are also clearly simple and interpretable can result. 相似文献
15.
《Journal of statistical planning and inference》2005,131(2):333-347
The generalized cross-validation (GCV) method has been a popular technique for the selection of tuning parameters for smoothing and penalty, and has been a standard tool to select tuning parameters for shrinkage models in recent works. Its computational ease and robustness compared to the cross-validation method makes it competitive for model selection as well. It is well known that the GCV method performs well for linear estimators, which are linear functions of the response variable, such as ridge estimator. However, it may not perform well for nonlinear estimators since the GCV emphasizes linear characteristics by taking the trace of the projection matrix. This paper aims to explore the GCV for nonlinear estimators and to further extend the results to correlated data in longitudinal studies. We expect that the nonlinear GCV and quasi-GCV developed in this paper will provide similar tools for the selection of tuning parameters in linear penalty models and penalized GEE models. 相似文献
16.
Dimension reduction for model-based clustering 总被引:1,自引:0,他引:1
Luca Scrucca 《Statistics and Computing》2010,20(4):471-484
We introduce a dimension reduction method for visualizing the clustering structure obtained from a finite mixture of Gaussian
densities. Information on the dimension reduction subspace is obtained from the variation on group means and, depending on
the estimated mixture model, on the variation on group covariances. The proposed method aims at reducing the dimensionality
by identifying a set of linear combinations, ordered by importance as quantified by the associated eigenvalues, of the original
features which capture most of the cluster structure contained in the data. Observations may then be projected onto such a
reduced subspace, thus providing summary plots which help to visualize the clustering structure. These plots can be particularly
appealing in the case of high-dimensional data and noisy structure. The new constructed variables capture most of the clustering
information available in the data, and they can be further reduced to improve clustering performance. We illustrate the approach
on both simulated and real data sets. 相似文献
17.
We address the problem of recovering a common set of covariates that are relevant simultaneously to several classification
problems. By penalizing the sum of ℓ
2 norms of the blocks of coefficients associated with each covariate across different classification problems, similar sparsity
patterns in all models are encouraged. To take computational advantage of the sparsity of solutions at high regularization
levels, we propose a blockwise path-following scheme that approximately traces the regularization path. As the regularization
coefficient decreases, the algorithm maintains and updates concurrently a growing set of covariates that are simultaneously
active for all problems. We also show how to use random projections to extend this approach to the problem of joint subspace selection, where multiple predictors are found in a common low-dimensional subspace. We present theoretical results showing that this
random projection approach converges to the solution yielded by trace-norm regularization. Finally, we present a variety of
experimental results exploring joint covariate selection and joint subspace selection, comparing the path-following approach
to competing algorithms in terms of prediction accuracy and running time. 相似文献
18.
《Journal of the Korean Statistical Society》2014,43(1):149-159
In this paper, a new method for robust principal component analysis (PCA) is proposed. PCA is a widely used tool for dimension reduction without substantial loss of information. However, the classical PCA is vulnerable to outliers due to its dependence on the empirical covariance matrix. To avoid such weakness, several alternative approaches based on robust scatter matrix were suggested. A popular choice is ROBPCA that combines projection pursuit ideas with robust covariance estimation via variance maximization criterion. Our approach is based on the fact that PCA can be formulated as a regression-type optimization problem, which is the main difference from the previous approaches. The proposed robust PCA is derived by substituting square loss function with a robust penalty function, Huber loss function. A practical algorithm is proposed in order to implement an optimization computation, and furthermore, convergence properties of the algorithm are investigated. Results from a simulation study and a real data example demonstrate the promising empirical properties of the proposed method. 相似文献
19.
David Hamilton 《Revue canadienne de statistique》1987,15(2):127-135
Exact confidence regions for all the parameters in nonlinear regression models can be obtained by comparing the lengths of projections of the error vector into orthogonal subspaces of the sample space. In certain partially nonlinear models an alternative exact region is obtained by replacing the linear parameters by their conditional estimates in the projection matrices. An ellipsoidal approximation to the alternative region is obtained in terms of the tangent-plane coordinates, similar to one previously obtained for the more usual region. This ellipsoid can be converted to an approximate region for the original parameters and can be used to compare the two types of exact confidence regions. 相似文献
20.
While studying the results from one European Parliament election, the question of principal component analysis (PCA) suitability
for this kind of data was raised. Since multiparty data should be seen as compositional data (CD), the application of PCA
is inadvisable and may conduct to ineligible results. This work points out the limitations of PCA to CD and presents a practical
application to the results from the European Parliament election in 2004. We present a comparative study between the results
of PCA, Crude PCA and Logcontrast PCA (Aitchison in: Biometrika 70:57–61, 1983; Kucera, Malmgren in: Marine Micropaleontology
34:117–120, 1998). As a conclusion of this study, and concerning the mentioned data set, the approach which produced clearer
results was the Logcontrast PCA. Moreover, Crude PCA conducted to misleading results since nonlinear relations were presented
between variables and the linear PCA proved, once again, to be inappropriate to analyse data which can be seen as CD. 相似文献