首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT

Canonical correlations are maximized correlation coefficients indicating the relationships between pairs of canonical variates that are linear combinations of the two sets of original variables. The number of non-zero canonical correlations in a population is called its dimensionality. Parallel analysis (PA) is an empirical method for determining the number of principal components or factors that should be retained in factor analysis. An example is given to illustrate for adapting proposed procedures based on PA and bootstrap modified PA to the context of canonical correlation analysis (CCA). The performances of the proposed procedures are evaluated in a simulation study by their comparison with traditional sequential test procedures with respect to the under-, correct- and over-determination of dimensionality in CCA.  相似文献   

2.
ADE-4: a multivariate analysis and graphical display software   总被引:59,自引:0,他引:59  
We present ADE-4, a multivariate analysis and graphical display software. Multivariate analysis methods available in ADE-4 include usual one-table methods like principal component analysis and correspondence analysis, spatial data analysis methods (using a total variance decomposition into local and global components, analogous to Moran and Geary indices), discriminant analysis and within/between groups analyses, many linear regression methods including lowess and polynomial regression, multiple and PLS (partial least squares) regression and orthogonal regression (principal component regression), projection methods like principal component analysis on instrumental variables, canonical correspondence analysis and many other variants, coinertia analysis and the RLQ method, and several three-way table (k-table) analysis methods. Graphical display techniques include an automatic collection of elementary graphics corresponding to groups of rows or to columns in the data table, thus providing a very efficient way for automatic k-table graphics and geographical mapping options. A dynamic graphic module allows interactive operations like searching, zooming, selection of points, and display of data values on factor maps. The user interface is simple and homogeneous among all the programs; this contributes to making the use of ADE-4 very easy for non- specialists in statistics, data analysis or computer science.  相似文献   

3.
In several research areas such as psychology, social science, and medicine, studies are conducted in which objects should be ranked by different judges/raters and the concordance of the different rankings is then analyzed. In such studies, it is also frequently of interest to compare the rankings between different groups of judges, e.g. female vs. male judges or judges from different professions. In the two-group case, the two-group concordance test of Schucany & Frawley can be employed for such a comparison. In this article, we propose an extension of this test enabling the comparison of rankings from more than two groups of judges. This test aims to detect disagreement in the average rankings of the objects between k groups with an at least moderate intra-group concordance. We evaluate this test in an extensive simulation study and in an application to data from an aesthetics study. This simulation study shows that the proposed test is able to detect differences between average rankings and performs well even in situations in which the disagreement is comparably small or the intra-group concordance is inhomogeneous.  相似文献   

4.
5.
In many experiments where data have been collected at two points in time (pre-treatment and post-treatment), investigators wish to determine if there is a difference between two treatment groups. In recent years it has been proposed that an appropriate statistical analysis to determine if treatment differences exist is to use the post-treatment values as the primary comparison variables and the pre-treatment values as covariates. When there are several outcome variables, we propose new tests based on residuals as alternatives to existing methods and investigate how the powers of the new and existing tests are affected by various choices of covariates. The limiting distribution of the test statistic of the new test based on residuals is given. Monte Carlo simulations are employed in the power comparisons.  相似文献   

6.
This paper considers the analysis of linear models where the response variable is a linear function of observable component variables. For example, scores on two or more psychometric measures (the component variables) might be weighted and summed to construct a single response variable in a psychological study. A linear model is then fit to the response variable. The question addressed in this paper is how to optimally transform the component variables so that the response is approximately normally distributed. The transformed component variables, themselves, need not be jointly normal. Two cases are considered; in both cases, the Box-Cox power family of transformations is employed. In Case I, the coefficients of the linear transformation are known constants. In Case II, the linear function is the first principal component based on the matrix of correlations among the transformed component variables. For each case, an algorithm is described for finding the transformation powers that minimize a generalized Anderson-Darling statistic. The proposed transformation procedure is compared to likelihood-based methods by means of simulation. The proposed method rarely performed worse than likelihood-based methods and for many data sets performed substantially better. As an illustration, the algorithm is applied to a problem from rural sociology and social psychology; namely scaling family residences along an urban-rural dimension.  相似文献   

7.
Regularization is a well-known and used statistical approach covering individual points or limit approximations. In this study, the canonical correlation analysis (CCA) process of the paths is discussed with partial least squares (PLS) as the other boundary covering transformation to a symmetric eigenvalue (or singular value) problem dependent on a parameter. Two regularizations of the original criterion in the parameterization domain are compared, i.e. using projection and by identity matrix. We discuss the existence and uniqueness of the analytic path for eigenvalues and corresponding elements of eigenvectors. Specifically, canonical analysis is applied to an ill-conditioned case of singular within-sets input matrices encompassing tourism accommodation data.KEYWORDS: Multivariate analysis, canonical correlation analysis, optimization, analytic decomposition, paths of eigenvalues and eigenvectors, tourismMSC Classifications: 62H20, 46N10, 62P20  相似文献   

8.
An approach to non-linear principal components using radially symmetric kernel basis functions is described. The procedure consists of two steps: a projection of the data set to a reduced dimension using a non-linear transformation whose parameters are determined by the solution of a generalized symmetric eigenvector equation. This is achieved by demanding a maximum variance transformation subject to a normalization condition (Hotelling's approach) and can be related to the homogeneity analysis approach of Gifi through the minimization of a loss function. The transformed variables are the principal components whose values define contours, or more generally hypersurfaces, in the data space. The second stage of the procedure defines the fitting surface, the principal surface, in the data space (again as a weighted sum of kernel basis functions) using the definition of self-consistency of Hastie and Stuetzle. The parameters of this principal surface are determined by a singular value decomposition and crossvalidation is used to obtain the kernel bandwidths. The approach is assessed on four data sets.  相似文献   

9.
The wide-ranging and rapidly evolving nature of ecological studies mean that it is not possible to cover all existing and emerging techniques for analyzing multivariate data. However, two important methods enticed many followers: the Canonical Correspondence Analysis (CCA) and the STATICO analysis. Despite the particular characteristics of each, they have similarities and differences, which when analyzed properly, can, together, provide important complementary results to those that are usually exploited by researchers. If on one hand, the use of CCA is completely generalized and implemented, solving many problems formulated by ecologists, on the other hand, this method has some weaknesses mainly caused by the imposition of the number of variables that is required to be applied (much higher in comparison with samples). Also, the STATICO method has no such restrictions, but requires that the number of variables (species or environment) is the same in each time or space. Yet, the STATICO method presents information that can be more detailed since it allows visualizing the variability within groups (either in time or space). In this study, the data needed for implementing these methods are sketched, as well as the comparison is made showing the advantages and disadvantages of each method. The treated ecological data are a sequence of pairs of ecological tables, where species abundances and environmental variables are measured at different, specified locations, over the course of time.  相似文献   

10.
11.
Generalized discriminant analysis based on distances   总被引:14,自引:1,他引:13  
This paper describes a method of generalized discriminant analysis based on a dissimilarity matrix to test for differences in a priori groups of multivariate observations. Use of classical multidimensional scaling produces a low‐dimensional representation of the data for which Euclidean distances approximate the original dissimilarities. The resulting scores are then analysed using discriminant analysis, giving tests based on the canonical correlations. The asymptotic distributions of these statistics under permutations of the observations are shown to be invariant to changes in the distributions of the original variables, unlike the distributions of the multi‐response permutation test statistics which have been considered by other workers for testing differences among groups. This canonical method is applied to multivariate fish assemblage data, with Monte Carlo simulations to make power comparisons and to compare theoretical results and empirical distributions. The paper proposes classification based on distances. Error rates are estimated using cross‐validation.  相似文献   

12.
The purpose of this paper is to examine the multiple group (>2) discrimination problem in which the group sizes are unequal and the variables used in the classification are correlated with skewed distributions. Using statistical simulation based on data from a clinical study, we compare the performances, in terms of misclassification rates, of nine statistical discrimination methods. These methods are linear and quadratic discriminant analysis applied to untransformed data, rank transformed data, and inverse normal scores data, as well as fixed kernel discriminant analysis, variable kernel discriminant analysis, and variable kernel discriminant analysis applied to inverse normal scores data. It is found that the parametric methods with transformed data generally outperform the other methods, and the parametric methods applied to inverse normal scores usually outperform the parametric methods applied to rank transformed data. Although the kernel methods often have very biased estimates, the variable kernel method applied to inverse normal scores data provides considerable improvement in terms of total nonerror rate.  相似文献   

13.
The variance inflation factor (VIF) is used to detect the presence of linear relationships between two or more independent variables (i.e. collinearity) in the multiple linear regression model. However, the traditionally used VIF definitions encounter some problems when extended to the case of the ridge estimation (RE). This paper presents an extension of the VIF in RE by providing two alternative VIF expressions that overcome these problems in the general case. Some characteristics of these expressions are also presented and compared with the traditional expression. The results are illustrated with an economic example in the case of three independent variables and with a Monte Carlo simulation for the general case.  相似文献   

14.
Summary. Varying-coefficient linear models arise from multivariate nonparametric regression, non-linear time series modelling and forecasting, functional data analysis, longitudinal data analysis and others. It has been a common practice to assume that the varying coefficients are functions of a given variable, which is often called an index . To enlarge the modelling capacity substantially, this paper explores a class of varying-coefficient linear models in which the index is unknown and is estimated as a linear combination of regressors and/or other variables. We search for the index such that the derived varying-coefficient model provides the least squares approximation to the underlying unknown multidimensional regression function. The search is implemented through a newly proposed hybrid backfitting algorithm. The core of the algorithm is the alternating iteration between estimating the index through a one-step scheme and estimating coefficient functions through one-dimensional local linear smoothing. The locally significant variables are selected in terms of a combined use of the t -statistic and the Akaike information criterion. We further extend the algorithm for models with two indices. Simulation shows that the methodology proposed has appreciable flexibility to model complex multivariate non-linear structure and is practically feasible with average modern computers. The methods are further illustrated through the Canadian mink–muskrat data in 1925–1994 and the pound–dollar exchange rates in 1974–1983.  相似文献   

15.
This work introduces specific tools based on phi-divergences to select and check generalized linear models with binary data. A backward selection criterion that helps to reduce the number of explanatory variables is considered. Diagnostic methods based on divergence measures such as a new measure to detect leverage points and two indicators to detect influential points are introduced. As an illustration, the diagnostics are applied to human psychology data.  相似文献   

16.
Cluster analysis methods are based on measures of 'distance' between objects. Sometimes the objects have an internal structure, and use of this can be made when defining such distances. This leads to non-standard cluster analysis methods. We illustrate with an application in which the objects are themselves classes and the aim is to produce clusters of classes which minimize the error rate of a supervised classification rule. For supervised classification problems with more than a handful of classes, there may exist groups of classes which are well separated from other groups, even though individual classes are not all well separated. In such cases, the overall misclassification rate is a crude measure of performance and more subtle measures, taking note of subgroup separation, are desirable. The fact that points can be assigned accurately to groups, if not to individual classes, can sometimes be practically useful.  相似文献   

17.
The restrictive properties of compositional data, that is multivariate data with positive parts that carry only relative information in their components, call for special care to be taken while performing standard statistical methods, for example, regression analysis. Among the special methods suitable for handling this problem is the total least squares procedure (TLS, orthogonal regression, regression with errors in variables, calibration problem), performed after an appropriate log-ratio transformation. The difficulty or even impossibility of deeper statistical analysis (confidence regions, hypotheses testing) using the standard TLS techniques can be overcome by calibration solution based on linear regression. This approach can be combined with standard statistical inference, for example, confidence and prediction regions and bounds, hypotheses testing, etc., suitable for interpretation of results. Here, we deal with the simplest TLS problem where we assume a linear relationship between two errorless measurements of the same object (substance, quantity). We propose an iterative algorithm for estimating the calibration line and also give confidence ellipses for the location of unknown errorless results of measurement. Moreover, illustrative examples from the fields of geology, geochemistry and medicine are included. It is shown that the iterative algorithm converges to the same values as those obtained using the standard TLS techniques. Fitted lines and confidence regions are presented for both original and transformed compositional data. The paper contains basic principles of linear models and addresses many related problems.  相似文献   

18.
An exact permutation test for analyzing and/or dredging multi-response data at the ordinal or higher levels is presented. The associated test statistic is based on the average distance (or any specified norm) between points within a priori disjoint subgroups of a finite population of points in an r-dimensional space (corresponding to r measured responses from each object in a finite population of objects). Alternative approximate tests based on the beta and normal distributions are provided. Two detailed examples utilizing actual social science data are considered, including comparisons of the approximate tests. An additional example describes the behavior of these tests under a variety of conditions, including extreme data configurations  相似文献   

19.
Summary.  Non-hierarchical clustering methods are frequently based on the idea of forming groups around 'objects'. The main exponent of this class of methods is the k -means method, where these objects are points. However, clusters in a data set may often be due to certain relationships between the measured variables. For instance, we can find linear structures such as straight lines and planes, around which the observations are grouped in a natural way. These structures are not well represented by points. We present a method that searches for linear groups in the presence of outliers. The method is based on the idea of impartial trimming. We search for the 'best' subsample containing a proportion 1− α of the data and the best k affine subspaces fitting to those non-discarded observations by measuring discrepancies through orthogonal distances. The population version of the sample problem is also considered. We prove the existence of solutions for the sample and population problems together with their consistency. A feasible algorithm for solving the sample problem is described as well. Finally, some examples showing how the method proposed works in practice are provided.  相似文献   

20.
An exploratory tool is introduced to examine potential non-linear relation-ships between two sets of variables, X andY, in a sample of multivariate data. Simulated annealing is applied to find canonical coefficient vectors a and b such that a squared non-linear correlation between a'Xand b'Y is maximiSed. A measure of non-linear correlation is developed for this optimization which utilies a nearest-neighbor regression estimate for the unknown functional relationship. In addition to examining potential relations between the canonical variables, this method can identify the important variables in each set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号