首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper some hierarchical methods for identifying groups of variables are illustrated and compared. It is shown that the use of multivariate association measures between two sets of variables can overcome the drawbacks of the usually employed bivariate correlation coefficient, but the resulting methods are generally not monotonic. Thus a new multivariate association measure is proposed, based on the links existing between canonical correlation analysis and principal component analysis, which can be more suitably used for the purpose at hand. The hierarchical method based on the suggested measure is illustrated and compared with other possible solutions by analysing simulated and real data sets. Finally an extension of the suggested method to the more general situation of mixed (qualitative and quantitative) variables is proposed and theoretically discussed.  相似文献   

2.
This paper is an overview of a unified framework for analyzing designed experiments with univariate or multivariate responses. Both categorical and continuous design variables are considered. To handle unbalanced data, we introduce the so-called Type II* sums of squares. This means that the results are independent of the scale chosen for continuous design variables. Furthermore, it does not matter whether two-level variables are coded as categorical or continuous. Overall testing of all responses is done by 50-50 MANOVA, which handles several highly correlated responses. Univariate p-values for each response are adjusted by using rotation testing. To illustrate multivariate effects, mean values and mean predictions are illustrated in a principal component score plot or directly as curves. For the unbalanced cases, we introduce a new variant of adjusted means, which are independent to the coding of two-level variables. The methodology is exemplified by case studies from cheese and fish pudding production.  相似文献   

3.
Abstract

In this article we study the relationship between principal component analysis and a multivariate dependency measure. It is shown, via simulated examples and real data, that the information provided by principal components is compatible with that obtained via the dependency measure δ. Furthermore, we show that in some instances in which principal component analysis fails to give reasonable results due to nonlinearity among the random variables, the dependency statistic δ still provides good results. Finally, we give some ideas about using the statistic δ in order to reduce the dimensionality of a given data set.  相似文献   

4.
The analysis of high-dimensional data often begins with the identification of lower dimensional subspaces. Principal component analysis is a dimension reduction technique that identifies linear combinations of variables along which most variation occurs or which best “reconstruct” the original variables. For example, many temperature readings may be taken in a production process when in fact there are just a few underlying variables driving the process. A problem with principal components is that the linear combinations can seem quite arbitrary. To make them more interpretable, we introduce two classes of constraints. In the first, coefficients are constrained to equal a small number of values (homogeneity constraint). The second constraint attempts to set as many coefficients to zero as possible (sparsity constraint). The resultant interpretable directions are either calculated to be close to the original principal component directions, or calculated in a stepwise manner that may make the components more orthogonal. A small dataset on characteristics of cars is used to introduce the techniques. A more substantial data mining application is also given, illustrating the ability of the procedure to scale to a very large number of variables.  相似文献   

5.
In this study, classical and robust principal component analyses are used to evaluate socioeconomic development of regions of development agencies that give service on the purpose of decreasing development difference among regions in Turkey. Due to the high differences between development levels of regions outlier problem occurs, hence robust statistical methods are used. Also, classical and robust statistical methods are used to investigate if there are any outliers in data set. In classic principal component analyse, the number of observations must be larger than the number of variables. Otherwise determinant of covariance matrix is zero. In Robust method for Principal Component Analysis (ROBPCA), a robust approach to principal component analyse in high-dimensional data, even if the number of variables is larger than the number of observations, principal components are obtained. In this paper, firstly 26 development agencies are evaluated with 19 variables by using principal component analysis based on classical and robust scatter matrices and then these 26 development agencies are evaluated with 46 variables by using the ROBPCA method.  相似文献   

6.
Principal components are often used for reducing dimensions in multivariate data, but they frequently fail to provide useful results and their interpretation is rather difficult. In this article, the use of entropy optimization principles for dimensional reduction in multivariate data is proposed. Under the assumptions of multivariate normality, a four-step procedure is developed for selecting principal variables and hence discarding redundant variables. For comparative performance of the information theoretic procedure, we use simulated data with known dimensionality. Principal variables of cluster bean (Guar) are identified by applying this procedure to a real data set generated in a plant breeding experiment.  相似文献   

7.
In practice, when a principal component analysis is applied on a large number of variables the resultant principal components may not be easy to interpret, as each principal component is a linear combination of all the original variables. Selection of a subset of variables that contains, in some sense, as much information as possible and enhances the interpretations of the first few covariance principal components is one possible approach to tackle this problem. This paper describes several variable selection criteria and investigates which criteria are best for this purpose. Although some criteria are shown to be better than others, the main message of this study is that it is unwise to rely on only one or two criteria. It is also clear that the interdependence between variables and the choice of how to measure closeness between the original components and those using subsets of variables are both important in determining the best criteria to use.  相似文献   

8.
Tanaka (1988) lias derived the influence functions, which are equivalent to the perturbation expansions up to linear terms, of two functions of eigenvalues and eigenvectors of a real symmetric matrix, and applied them to principal component analysis. The present paper deals with the perturbation expansions up to quadratic terms of the same functions and discusses their application to sensitivity analysis in multivariate methods, in particular, principal component analysis and principal factor analysis. Numerical examples are given to show how the approximation improves with the quadratic terms.  相似文献   

9.
ABSTRACT

The identification of the out of control variable, or variables, after a multivariate control chart signals, is an appealing subject for many researchers in the last years. In this paper we propose a new method for approaching this problem based on principal components analysis. Theoretical control limits are derived and a detailed investigation of the properties and the limitations of the new method is given. A graphical technique which can be applied in some of these limiting situations is also provided.  相似文献   

10.
Interpretation of principal components is difficult due to their weights (loadings, coefficients) being of various sizes. Whereas very small weights or very large weights can give clear indication of the importance of particular variables, weights that are neither large nor small (‘grey area’ weights) are problematical. This is a particular problem in the fast moving goods industries where a lot of multivariate panel data are collected on products. These panel data are subjected to univariate analyses and multivariate analyses where principal components (PCs) are key to the interpretation of the data. Several authors have suggested alternatives to PCs, seeking simplified components such as sparse PCs. Here components, termed simple components (SCs), are sought in conjunction with Thurstonian criteria that a component should have only a few variables highly weighted on it and each variable should be weighted heavily on just a few components. An algorithm is presented that finds SCs efficiently. Simple components are found for panel data consisting of the responses to a questionnaire on efficacy and other features of deodorants. It is shown that five SCs can explain an amount of variation within the data comparable to that explained by the PCs, but with easier interpretation.  相似文献   

11.
Evaluation of trace evidence in the form of multivariate data   总被引:1,自引:0,他引:1  
Summary.  The evaluation of measurements on characteristics of trace evidence found at a crime scene and on a suspect is an important part of forensic science. Five methods of assessment for the value of the evidence for multivariate data are described. Two are based on significance tests and three on the evaluation of likelihood ratios. The likelihood ratio which compares the probability of the measurements on the evidence assuming a common source for the crime scene and suspect evidence with the probability of the measurements on the evidence assuming different sources for the crime scene and suspect evidence is a well-documented measure of the value of the evidence. One of the likelihood ratio approaches transforms the data to a univariate projection based on the first principal component. The other two versions of the likelihood ratio for multivariate data account for correlation among the variables and for two levels of variation: that between sources and that within sources. One version assumes that between-source variability is modelled by a multivariate normal distribution; the other version models the variability with a multivariate kernel density estimate. Results are compared from the analysis of measurements on the elemental composition of glass.  相似文献   

12.
Principal component analysis is a popular dimension reduction technique often used to visualize high‐dimensional data structures. In genomics, this can involve millions of variables, but only tens to hundreds of observations. Theoretically, such extreme high dimensionality will cause biased or inconsistent eigenvector estimates, but in practice, the principal component scores are used for visualization with great success. In this paper, we explore when and why the classical principal component scores can be used to visualize structures in high‐dimensional data, even when there are few observations compared with the number of variables. Our argument is twofold: First, we argue that eigenvectors related to pervasive signals will have eigenvalues scaling linearly with the number of variables. Second, we prove that for linearly increasing eigenvalues, the sample component scores will be scaled and rotated versions of the population scores, asymptotically. Thus, the visual information of the sample scores will be unchanged, even though the sample eigenvectors are biased. In the case of pervasive signals, the principal component scores can be used to visualize the population structures, even in extreme high‐dimensional situations.  相似文献   

13.
Block-structured correlation matrices are correlation matrices in which the p variables are subdivided into homogeneous groups, with equal correlations for variables within each group, and equal correlations between any given pair of variables from different groups. Block-structured correlation matrices arise as approximations for certain data sets’ true correlation matrices. A block structure in a correlation matrix entails a certain number of properties regarding its eigendecomposition and, therefore, a principal component analysis of the underlying data. This paper explores these properties, both from an algebraic and a geometric perspective, and discusses their robustness. Suggestions are also made regarding the choice of variables to be subjected to a principal component analysis, when in the presence of (approximately) block-structured variables.  相似文献   

14.
ADE-4: a multivariate analysis and graphical display software   总被引:59,自引:0,他引:59  
We present ADE-4, a multivariate analysis and graphical display software. Multivariate analysis methods available in ADE-4 include usual one-table methods like principal component analysis and correspondence analysis, spatial data analysis methods (using a total variance decomposition into local and global components, analogous to Moran and Geary indices), discriminant analysis and within/between groups analyses, many linear regression methods including lowess and polynomial regression, multiple and PLS (partial least squares) regression and orthogonal regression (principal component regression), projection methods like principal component analysis on instrumental variables, canonical correspondence analysis and many other variants, coinertia analysis and the RLQ method, and several three-way table (k-table) analysis methods. Graphical display techniques include an automatic collection of elementary graphics corresponding to groups of rows or to columns in the data table, thus providing a very efficient way for automatic k-table graphics and geographical mapping options. A dynamic graphic module allows interactive operations like searching, zooming, selection of points, and display of data values on factor maps. The user interface is simple and homogeneous among all the programs; this contributes to making the use of ADE-4 very easy for non- specialists in statistics, data analysis or computer science.  相似文献   

15.
Principal component and correspondence analysis can both be used as exploratory methods for representing multivariate data in two dimensions. Circumstances under which the, possibly inappropriate, application of principal components to untransformed compositional data approximates to a correspondence analysis of the raw data are noted. Aitchison (1986) has proposed a method for the principal component analysis of compositional data involving transformation of the raw data. It is shown how this can be approximated by a correspondence analysis of appropriately transformed data. The latter approach may be preferable when there are zeroes in the data.  相似文献   

16.
ABSTRACT

Process capability indices measure the ability of a process to provide products that meet certain specifications. Few references deal with the capability of a process characterized by a functional relationship between a response variable and one or more explanatory variables, which is called profile. Specifically, there is not any reference analysing the capability of processes characterized by multivariate nonlinear profiles. In this paper, we propose a method to measure the capability of these processes, based on principal components for multivariate functional data and the concept of functional depth. A simulation study is conducted to assess the performance of the proposed method. An example from the sugar production illustrates the applicability of this approach.  相似文献   

17.
Biplots represent a widely used statistical tool for visualizing the resulting loadings and scores of a dimension reduction technique applied to multivariate data. If the underlying data carry only relative information (i.e. compositional data expressed in proportions, mg/kg, etc.) they have to be pre-processed with a logratio transformation before the dimension reduction is carried out. In the context of principal component analysis, the resulting biplot is called compositional biplot. We introduce an alternative, the ilr biplot, which is based on a special choice of orthonormal coordinates resulting from an isometric logratio (ilr) transformation. This allows to incorporate also external non-compositional variables, and to study the relations to the compositional variables. The methodology is demonstrated on real data sets.  相似文献   

18.
Dynamic principal component analysis (DPCA), also known as frequency domain principal component analysis, has been developed by Brillinger [Time Series: Data Analysis and Theory, Vol. 36, SIAM, 1981] to decompose multivariate time-series data into a few principal component series. A primary advantage of DPCA is its capability of extracting essential components from the data by reflecting the serial dependence of them. It is also used to estimate the common component in a dynamic factor model, which is frequently used in econometrics. However, its beneficial property cannot be utilized when missing values are present, which should not be simply ignored when estimating the spectral density matrix in the DPCA procedure. Based on a novel combination of conventional DPCA and self-consistency concept, we propose a DPCA method when missing values are present. We demonstrate the advantage of the proposed method over some existing imputation methods through the Monte Carlo experiments and real data analysis.  相似文献   

19.
In this article, we consider the performance of the principal component two-parameter estimator in situation of multicollinearity for misspecified linear regression model where misspecification is due to omission of some relevant explanatory variables. The conditions of superiority of the principal component two-parameter estimator over some estimators under the Mahalanobis loss function by the average loss criterion are derived. Furthermore, a real data example and a Monte Carlo simulation study are provided to illustrate some of the theoretical results.  相似文献   

20.
主成分与因子分析中指标同趋势化方法探讨   总被引:9,自引:0,他引:9  
样本主成分和样本因子分析法已成为一种最主要的综合评价方法之一,指标变量的同趋势化是运用该方法的重要步骤。文章总结了主成分与因子分析中指标同趋势化的具体方法,论述了这些方法对综合评价的影响,并指出了这些方法的适用条件。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号