首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A robust biplot     
This paper introduces a robust biplot which is related to multivariate M-estimates. The n × p data matrix is first considered as a sample of size n from some p-variate population, and robust M-estimates of the population location vector and scatter matrix are calculated. In the construction of the biplot, each row of the data matrix is assigned a weight determined in the preliminary robust estimation. In a robust biplot, one can plot the variables in order to represent characteristics of the robust variance-covariance matrix: the length of the vector representing a variable is proportional to its robust standard deviation, while the cosine of the angle between two variables is approximately equal to their robust correlation. The proposed biplot also permits a meaningful representation of the variables in a robust principal-component analysis. The discrepancies between least-squares and robust biplots are illustrated in an example.  相似文献   

2.
Conditions under which correspondence analysis maps are biplots are discussed, as well as the interpretation of such biplots. It is shown that the asymmetric map which jointly displays the profiles and the vertices which define the unit vectors in the profile space is a biplot. A number of different ways of interpreting this joint plot are discussed, some of these being dependent on the choice of the x2 metric in the profile space. Biplot axes can be defined and calibrated on the zero-to-one profile scale in the usual way to recover approximations to the individual profile elements. Finally, the biplot interpretation in the context of multiple correspondence analysis is discussed. It is pointed out that joint correspondence analysis leads to a joint display of several variables which can be calibrated in a similar fashion to recover profile elements of the subtables of the Burt matrix.  相似文献   

3.
Biplots of compositional data   总被引:6,自引:0,他引:6  
Summary. The singular value decomposition and its interpretation as a linear biplot have proved to be a powerful tool for analysing many forms of multivariate data. Here we adapt biplot methodology to the specific case of compositional data consisting of positive vectors each of which is constrained to have unit sum. These relative variation biplots have properties relating to the special features of compositional data: the study of ratios, subcompositions and models of compositional relationships. The methodology is applied to a data set consisting of six-part colour compositions in 22 abstract paintings, showing how the singular value decomposition can achieve an accurate biplot of the colour ratios and how possible models interrelating the colours can be diagnosed.  相似文献   

4.
We consider the joint analysis of two matched matrices which have common rows and columns, for example multivariate data observed at two time points or split according to a dichotomous variable. Methods of interest include principal components analysis for interval-scaled data, correspondence analysis for frequency data, log-ratio analysis of compositional data and linear biplots in general, all of which depend on the singular value decomposition. A simple result in matrix algebra shows that by setting up two matched matrices in a particular block format, matrix sum and difference components can be analysed using a single application of the singular value decomposition algorithm. The methodology is applied to data from the International Social Survey Program comparing male and female attitudes on working wives across eight countries. The resulting biplots optimally display the overall cross-cultural differences as well as the male-female differences. The case of more than two matched matrices is also discussed.  相似文献   

5.
In this paper, we propose a graphical representation of data and a test statistic based on it for testing the goodness of fit of a completely specified null distribution. The graph is constructed as a linked line chart given by vectors which reflect the pattern of order statistics. The test statistic is defined as an area defined by our chart and its asymptotic distribution is derived under the null hypothesis. Computer simulations performed to study the power properties of our chart indicate that the test is powerful for scale alternatives. Furthermore, it is shown that our test is closely related to the Watson test.  相似文献   

6.
Regression methods for common data types such as measured, count and categorical variables are well understood but increasingly statisticians need ways to model relationships between variable types such as shapes, curves, trees, correlation matrices and images that do not fit into the standard framework. Data types that lie in metric spaces but not in vector spaces are difficult to use within the usual regression setting, either as the response and/or a predictor. We represent the information in these variables using distance matrices which requires only the specification of a distance function. A low-dimensional representation of such distance matrices can be obtained using methods such as multidimensional scaling. Once these variables have been represented as scores, an internal model linking the predictors and the responses can be developed using standard methods. We call scoring as the transformation from a new observation to a score, whereas backscoring is a method to represent a score as an observation in the data space. Both methods are essential for prediction and explanation. We illustrate the methodology for shape data, unregistered curve data and correlation matrices using motion capture data from an experiment to study the motion of children with cleft lip.  相似文献   

7.
Confronted with multivariate group-structured data, one is in fact always interested in describing differences between groups. In this paper, canonical correlation analysis (CCA) is used as an exploratory data analysis tool to detect and describe differences between groups of objects. CCA allows for the construction of Gabriel biplots, relating representations of objects and variables in the plane that best represents the distinction of the groups of object points. In the case of non-linear CCA, transformations of the original variables are suggested to achieve a better group separation compared with that obtained by linear CCA. One can detect which (transformed) variables are responsible for this separation. The separation itself might be due to several characteristics of the data (eg. distances between the centres of gravity of the original or transformed groups of object points, or differences in the structure of the original groups). Four case studies give an overview of an exploration of the possibilities offered by linear and non-linear CCA.  相似文献   

8.
Given multivariate normal data and a certain spherically invariant prior distribution on the covariance matrix, it is desired to estimate the moments of the posterior marginal distributions of some scalar functions of the covariance matrix by importance sampling. To this end a family of distributions is defined on the group of orthogonal matrices and a procedure is proposed for selecting one of these distributions for use as a weighting distribution in the importance sampling process. In an example estimates are calculated for the posterior mean and variance of each element in the covariance matrix expressed in the original coordinates, for the posterior mean of each element in the correlation matrix expressed in the original coordinates, and for the posterior mean of each element in the covariance matrix expressed in the coordinates of the principal variables.  相似文献   

9.
The aim of this study is to obtain robust canonical vectors and correlation coefficients based on the percentage bend correlation and winsorized correlation in the correlation matrix and fast consistent high breakdown (FCH), reweighted fast consistent high breakdown (RFCH), and reweighted multivariate normal (RMVN) estimators to estimate the covariance matrix and then compare these estimators with the existing estimators. In the correlation matrix of canonical correlation analysis (CCA), we present an approach that substitutes the percentage bend correlation and the winsorized correlation in place of the widely employed the Pearson correlation. Moreover, we employ the FCH, RFCH, and RMVN estimators to estimate the covariance matrix in the CCA. We conduct a simulation study and employ real data with the objective of comparing the performance of the different estimators for canonical vectors and correlation with that of our proposed approaches. The breakdown plots and independent tests are employed as differentiating criteria of the robustness and performance of the estimators. Based on our computational and real data studies, we propose suggestions and guidelines on the practical implications of our findings.  相似文献   

10.
Generalized discriminant analysis based on distances   总被引:14,自引:1,他引:13  
This paper describes a method of generalized discriminant analysis based on a dissimilarity matrix to test for differences in a priori groups of multivariate observations. Use of classical multidimensional scaling produces a low‐dimensional representation of the data for which Euclidean distances approximate the original dissimilarities. The resulting scores are then analysed using discriminant analysis, giving tests based on the canonical correlations. The asymptotic distributions of these statistics under permutations of the observations are shown to be invariant to changes in the distributions of the original variables, unlike the distributions of the multi‐response permutation test statistics which have been considered by other workers for testing differences among groups. This canonical method is applied to multivariate fish assemblage data, with Monte Carlo simulations to make power comparisons and to compare theoretical results and empirical distributions. The paper proposes classification based on distances. Error rates are estimated using cross‐validation.  相似文献   

11.
Popular rank-2 and rank-3 models for two-way tables have geometrical properties which can be used as diagnostic keys in screening for an appropriate model. Row and column levels of two-way tables are represented by points in two or three dimensional space, whereupon collinearity and coplanarity of row and column points provide diagnostic keys for informal model choice. Coordinates are obtained from a factorization of the two-way table Y in the matrix product UV T. The rows of U then contain row-point coordinates and the rows of V column-point coordinates. Illustrations of applications of diagnostic biplots in the literature were restricted to data from chemistry and physics with little or no noise. In plant breeding, two-way tables containing substantial amounts of noise regularly arise in the form of genotype by environment tables. To investigate the usefulness of diagnostic biplots for model screening for genotype by environment tables, data tables were generated from a range of two-way models under the addition of various amounts of noise. Chances for correct diagnosis of the generating model depended on the type of model. Diagnostic biplots on their own do not seem to provide a sufficient means for model selection for genotype by environment tables, but in combination with other methods they certainly can provide extra insight into the structure of the data.  相似文献   

12.
The Ising model is one of the simplest and most famous models of interacting systems. It was originally proposed to model ferromagnetic interactions in statistical physics and is now widely used to model spatial processes in many areas such as ecology, sociology, and genetics, usually without testing its goodness of fit. Here, we propose various test statistics and an exact goodness‐of‐fit test for the finite‐lattice Ising model. The theory of Markov bases has been developed in algebraic statistics for exact goodness‐of‐fit testing using a Monte Carlo approach. However, finding a Markov basis is often computationally intractable. Thus, we develop a Monte Carlo method for exact goodness‐of‐fit testing for the Ising model that avoids computing a Markov basis and also leads to a better connectivity of the Markov chain and hence to a faster convergence. We show how this method can be applied to analyze the spatial organization of receptors on the cell membrane.  相似文献   

13.
In this paper, we introduce linear modeling of canonical correlation analysis, which estimates canonical direction matrices by minimising a quadratic objective function. The linear modeling results in a class of estimators of canonical direction matrices, and an optimal class is derived in the sense described herein. The optimal class guarantees several of the following desirable advantages: first, its estimates of canonical direction matrices are asymptotically efficient; second, its test statistic for determining the number of canonical covariates always has a chi‐squared distribution asymptotically; third, it is straight forward to construct tests for variable selection. The standard canonical correlation analysis and other existing methods turn out to be suboptimal members of the class. Finally, we study the role of canonical variates as a means of dimension reduction for predictors and responses in multivariate regression. Numerical studies and data analysis are presented.  相似文献   

14.
ABSTRACT: We introduce a class of Toeplitz‐band matrices for simple goodness of fit tests for parametric regression models. For a given length r of the band matrix the asymptotic optimal solution is derived. Asymptotic normality of the corresponding test statistic is established under a fixed and random design assumption as well as for linear and non‐linear models, respectively. This allows testing at any parametric assumption as well as the computation of confidence intervals for a quadratic measure of discrepancy between the parametric model and the true signal g;. Furthermore, the connection between testing the parametric goodness of fit and estimating the error variance is highlighted. As a by‐product we obtain a much simpler proof of a result of 34 ) concerning the optimality of an estimator for the variance. Our results unify and generalize recent results by 9 ) and 15 , 16 ) in several directions. Extensions to multivariate predictors and unbounded signals are discussed. A simulation study shows that a simple jacknife correction of the proposed test statistics leads to reasonable finite sample approximations.  相似文献   

15.
Linear combinations of random variables play a crucial role in multivariate analysis. Two extension of this concept are considered for functional data and shown to coincide using the Loève–Parzen reproducing kernel Hilbert space representation of a stochastic process. This theory is then used to provide an extension of the multivariate concept of canonical correlation. A solution to the regression problem of best linear unbiased prediction is obtained from this abstract canonical correlation formulation. The classical identities of Lawley and Rao that lead to canonical factor analysis are also generalized to the functional data setting. Finally, the relationship between Fisher's linear discriminant analysis and canonical correlation analysis for random vectors is extended to include situations with function-valued random elements. This allows for classification using the canonical Y scores and related distance measures.  相似文献   

16.
Regularization is a well-known and used statistical approach covering individual points or limit approximations. In this study, the canonical correlation analysis (CCA) process of the paths is discussed with partial least squares (PLS) as the other boundary covering transformation to a symmetric eigenvalue (or singular value) problem dependent on a parameter. Two regularizations of the original criterion in the parameterization domain are compared, i.e. using projection and by identity matrix. We discuss the existence and uniqueness of the analytic path for eigenvalues and corresponding elements of eigenvectors. Specifically, canonical analysis is applied to an ill-conditioned case of singular within-sets input matrices encompassing tourism accommodation data.KEYWORDS: Multivariate analysis, canonical correlation analysis, optimization, analytic decomposition, paths of eigenvalues and eigenvectors, tourismMSC Classifications: 62H20, 46N10, 62P20  相似文献   

17.
Biplots are useful tools to explore the relationship among variables. In this paper, the specific regression relationship between a set of predictors X and set of response variables Y by means of partial least-squares (PLS) regression is represented. The PLS biplot provides a single graphical representation of the samples together with the predictor and response variables, as well as their interrelationships in terms of the matrix of regression coefficients.  相似文献   

18.
This paper extends the results of canonical correlation analysis of Anderson [2002. Canonical correlation analysis and reduced-rank regression in autoregressive models. Ann. Statist. 30, 1134–1154] to a vector AR(1) process with a vector ARCH(1) innovations. We obtain the limiting distributions of the sample matrices, the canonical correlations and the canonical vectors of the process. The extension is important because many time series in economics and finance exhibit conditional heteroscedasticity. We also use simulation to demonstrate the effects of ARCH innovations on the canonical correlation analysis in finite sample. Both the limiting distributions and simulation results show that overlooking the ARCH effects in canonical correlation analysis can easily lead to erroneous inference.  相似文献   

19.
The authors show how to test the goodness‐of‐fit of a linear regression model when there are missing data in the response variable. Their statistics are based on the L2 distance between nonparametric estimators of the regression function and a ‐consistent estimator of the same function under the parametric model. They obtain the limit distribution of the statistics and check the validity of their bootstrap version. Finally, a simulation study allows them to examine the behaviour of their tests, whether the samples are complete or not.  相似文献   

20.
Test statistics are developed for comparing vectors of proportions obtained from several independent two–stage cluster samples. It is assumed that clusters are selected with probability proportional to size for each sample. Wald's general method of constructing quadratic forms is used to obtain a large sample chi–square test. More easily evaluted chi–square tests are derived from the Dirichlet–multinnomial model. Corresponding goodness–of–fit test for the Dirichlet–multinomial model are also derived.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号