期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An RKHS framework for functional data analysis

Ana Kupresanin Hyejin Shin David King R.L. Eubank 《Journal of statistical planning and inference》2010

Linear combinations of random variables play a crucial role in multivariate analysis. Two extension of this concept are considered for functional data and shown to coincide using the Loève–Parzen reproducing kernel Hilbert space representation of a stochastic process. This theory is then used to provide an extension of the multivariate concept of canonical correlation. A solution to the regression problem of best linear unbiased prediction is obtained from this abstract canonical correlation formulation. The classical identities of Lawley and Rao that lead to canonical factor analysis are also generalized to the functional data setting. Finally, the relationship between Fisher's linear discriminant analysis and canonical correlation analysis for random vectors is extended to include situations with function-valued random elements. This allows for classification using the canonical Y scores and related distance measures. 相似文献

2.

Linear and non-linear canonical correlation analysis:an exploratory tool for the analysis of group-structured data 总被引：1，自引：1，他引：0

K. Luijtens F. Symons M. Vuylsteke-Wauters 《Journal of applied statistics》1994,21(3):43-61

Confronted with multivariate group-structured data, one is in fact always interested in describing differences between groups. In this paper, canonical correlation analysis (CCA) is used as an exploratory data analysis tool to detect and describe differences between groups of objects. CCA allows for the construction of Gabriel biplots, relating representations of objects and variables in the plane that best represents the distinction of the groups of object points. In the case of non-linear CCA, transformations of the original variables are suggested to achieve a better group separation compared with that obtained by linear CCA. One can detect which (transformed) variables are responsible for this separation. The separation itself might be due to several characteristics of the data (eg. distances between the centres of gravity of the original or transformed groups of object points, or differences in the structure of the original groups). Four case studies give an overview of an exploration of the possibilities offered by linear and non-linear CCA. 相似文献

3.

Canonical Correlation Analysis Using Small Number of Samples

Hye-Seung Lee 《统计学通讯:模拟与计算》2013,42(5):973-985

Canonical correlation assesses the relationship between two groups of variables. Although it has been a useful tool in a wide variety of research areas, it is not well known that weaker canonical correlations require larger sample sizes to be correctly inferred. In this article, we investigate small sample bias in canonical correlation analysis and apply the jackknife bias correction to the estimation of canonical correlations. We use bootstrap samples to obtain a better confidence interval for the jackknife canonical correlation estimator. 相似文献

4.

Interpretation of Canonical Discriminant Functions,Canonical Variates,and Principal Components 总被引：1，自引：0，他引：1

Alvin C. Rencher 《The American statistician》2013,67(3):217-225

Canonical discriminant functions are defined here as linear combinations that separate groups of observations, and canonical variates are defined as linear combinations associated with canonical correlations between two sets of variables. In standardized form, the coefficients in either type of canonical function provide information about the joint contribution of the variables to the canonical function. The standardized coefficients can be converted to correlations between the variables and the canonical function. These correlations generally alter the interpretation of the canonical functions. For canonical discriminant functions, the standardized coefficients are compared with the correlations, with partial t and F tests, and with rotated coefficients. For canonical variates, the discussion includes standardized coefficients, correlations between variables and the function, rotation, and redundancy analysis. Various approaches to interpretation of principal components are compared: the choice between the covariance and correlation matrices, the conversion of coefficients to correlations, the rotation of the coefficients, and the effect of special patterns in the covariance and correlation matrices. 相似文献

5.

典型相关分析的延拓研究

杜子芳常志勇《统计与信息论坛》2014,(5):3-7

在典型相关分析中,求得典型相关变量的表达式并没有全部完成任务,例如需要确定典型相关变量的个数和变量选择。针对典型相关变量的个数问题,发现了常用的卡方检验和冗余分析方法的不足,进而提出了一种新的算法。针对原始变量的选择问题,提出了三种可能的路径。最后利用人体尺寸数据对相关结论进行了验证。相似文献

6.

Parallel analysis approach for determining dimensionality in canonical correlation analysis

《Journal of Statistical Computation and Simulation》2012,82(17):3419-3431

ABSTRACT

Canonical correlations are maximized correlation coefficients indicating the relationships between pairs of canonical variates that are linear combinations of the two sets of original variables. The number of non-zero canonical correlations in a population is called its dimensionality. Parallel analysis (PA) is an empirical method for determining the number of principal components or factors that should be retained in factor analysis. An example is given to illustrate for adapting proposed procedures based on PA and bootstrap modified PA to the context of canonical correlation analysis (CCA). The performances of the proposed procedures are evaluated in a simulation study by their comparison with traditional sequential test procedures with respect to the under-, correct- and over-determination of dimensionality in CCA. 相似文献

7.

Using restricted CCA for cross-language information retrieval

Emil Polajnar 《统计学通讯:模拟与计算》2017,46(6):4618-4626

Canonical correlation analysis is a method of correlating linear relationship between two sets of variables. When not any linear combination of variables is allowed, restricted canonical correlation analysis is appropriate. The method was implemented with alternating least-squares and applied to the cross-language information retrieval on a dataset with officially translated and aligned documents in eight European languages. 相似文献

8.

Variable selection and interpretation in canonical correlation analysis

Noriah M. Al-Kandari Ian T Jolliffe 《统计学通讯:模拟与计算》2013,42(3):873-900

The canonical variates in canonical correlation analysis are often interpreted by looking at the weights or loadings of the variables in each canonical variate and effectively ignoring those variables whose weights or loadings are small. It is shown that such a procedure can be misleading. The related problem of selecting a subset of the original variables which preserves the information in the most important canonical variates is also examined. Because of different possible definitions of ‘the information in canonical variates’, any such subset selection needs very careful consideration. 相似文献

9.

Nonlinear measures of association with kernel canonical correlation analysis and applications

Su-Yun Huang Mei-Hsien Lee Chuhsing Kate Hsiao 《Journal of statistical planning and inference》2009

Measures of association between two sets of random variables have long been of interest to statisticians. The classical canonical correlation analysis (LCCA) can characterize, but also is limited to, linear association. This article introduces a nonlinear and nonparametric kernel method for association study and proposes a new independence test for two sets of variables. This nonlinear kernel canonical correlation analysis (KCCA) can also be applied to the nonlinear discriminant analysis. Implementation issues are discussed. We place the implementation of KCCA in the framework of classical LCCA via a sequence of independent systems in the kernel associated Hilbert spaces. Such a placement provides an easy way to carry out the KCCA. Numerical experiments and comparison with other nonparametric methods are presented. 相似文献

10.

Using Lasso RCCA for cross-language information retrieval

Emil Polajnar 《统计学通讯:模拟与计算》2013,42(9):2739-2748

ABSTRACT

Restricted canonical correlation analysis and the lasso shrinkage method were paired together for canonical correlation analysis with non-negativity restrictions on datasets, where a sample size is much smaller than the number of variables. The method was implemented in an alternating least-squares algorithm and applied to cross-language information retrieval on a dataset with aligned documents in eight languages. A set of experiments was ran to evaluate the method and compare it to other methods in the field. 相似文献

11.

我国城镇居民家庭纯收入和生活消费支出的典型相关分析

杨丹卜胜娟《统计与信息论坛》2005,20(2):96-99

文章介绍了两组随机变量相关性的一种统计分析方法———典型相关分析(canonicalcorrelationanalysis),然后利用这种方法,借助SAS软件研究了我国31个省市自治区城镇居民家庭纯收入与生活消费支出的相关关系。相似文献

12.

Hierarchical clustering of variables: a comparison among strategies of analysis

Gabriele Soffritti 《统计学通讯:模拟与计算》2013,42(4):977-999

In this paper some hierarchical methods for identifying groups of variables are illustrated and compared. It is shown that the use of multivariate association measures between two sets of variables can overcome the drawbacks of the usually employed bivariate correlation coefficient, but the resulting methods are generally not monotonic. Thus a new multivariate association measure is proposed, based on the links existing between canonical correlation analysis and principal component analysis, which can be more suitably used for the purpose at hand. The hierarchical method based on the suggested measure is illustrated and compared with other possible solutions by analysing simulated and real data sets. Finally an extension of the suggested method to the more general situation of mixed (qualitative and quantitative) variables is proposed and theoretically discussed. 相似文献

13.

Comparison of Efficiency of Stratified and Unequal Probability Sampling

Marcin Kozak Andrzej Zieliński 《统计学通讯:模拟与计算》2013,42(4):807-816

In this article, a choice of the optimum sampling design to study a finite population is studied. Three sampling schemes are compared, viz., Sunter's procedure of unequal probability sampling, stratified sampling under optimum stratification, and simple random sampling without replacement. The comparison is made against a background of various correlation between stratification and survey variables and various variability in the variables. Under weak correlation and large variability, stratification appeared to be more efficient than Sunter's procedure. Under strong correlation and/or low variability in the variables, the latter procedure was the most efficient. Simple random sampling was usually the least efficient. 相似文献

14.

A generalization of the bivariate Beta-Binomial distribution

M.J. Olmo-Jiménez A.M. Martínez-RodríguezA. Conde-Sánchez J. Rodríguez-Avi 《Journal of statistical planning and inference》2011,141(7):2303-2311

This paper presents a new bivariate discrete distribution that generalizes the bivariate Beta-Binomial distribution. It is generated by Appell hypergeometric function F₁ and can be obtained as a Binomial mixture with an Exton's Generalized Beta distribution. The model has different marginal distributions which are, together with the conditional distributions, more flexible than the Beta-Binomial distribution. It has non-linear regression curves and is useful for random variables with positive correlation. These features make the model very adequate to fit observed data as the two applications included show. 相似文献

15.

On a measure of dependence based on fisher's information matrix

K. Zografos 《统计学通讯:理论与方法》2013,42(7):1715-1728

A class of measures of dependence between two random vectors is defined, in terms of the canonical correlations obtained from Fisher's information matrix. Some basic properties are proved for this class of measures. Examples are given to illustrate that the class gives good measures, under normal models. Interesting measures are also arise for bivariate models where the correlation coefficient does not exist for some values of the parameters of the model. 相似文献

16.

Identifying Nonlinear Relationships in Regression using the ACE Algorithm

Duolao Wang Michael Murphy 《Journal of applied statistics》2005,32(3):243-258

This paper introduces an alternating conditional expectation (ACE) algorithm: a non-parametric approach for estimating the transformations that lead to the maximal multiple correlation of a response and a set of independent variables in regression and correlation analysis. These transformations can give the data analyst insight into the relationships between these variables so that this can be best described and non-linear relationships uncovered. Using the Bayesian information criterion (BIC), we show how to find the best closed-form approximations for the optimal ACE transformations. By means of ACE and BIC, the model fit can be considerably improved compared with the conventional linear model as demonstrated in the two simulated and two real datasets in this paper. 相似文献

17.

A new space–time multivariate approach for environmental data analysis

Sandra De Iaco 《Journal of applied statistics》2011,38(11):2471-2483

Air quality control usually requires a monitoring system of multiple indicators measured at various points in space and time. Hence, the use of space–time multivariate techniques are of fundamental importance in this context, where decisions and actions regarding environmental protection should be supported by studies based on either inter-variables relations and spatial–temporal correlations. This paper describes how canonical correlation analysis can be combined with space–time geostatistical methods for analysing two spatial–temporal correlated aspects, such as air pollution concentrations and meteorological conditions. Hourly averages of three pollutants (nitric oxide, nitrogen dioxide and ozone) and three atmospheric indicators (temperature, humidity and wind speed) taken for two critical months (February and August) at several monitoring stations are considered and space–time variograms for the variables are estimated. Simultaneous relationships between such sample space–time variograms are determined through canonical correlation analysis. The most correlated canonical variates are used for describing synthetically the underlying space–time behaviour of the components of the two sets. 相似文献

18.

Categorical multiblock linear discriminant analysis

Philippe Casin 《Journal of applied statistics》2018,45(8):1396-1409

Techniques of credit scoring have been developed these last years in order to reduce the risk taken by banks and financial institutions in the loans that they are granting. Credit Scoring is a classification problem of individuals in one of the two following groups: defaulting borrowers or non-defaulting borrowers. The aim of this paper is to propose a new method of discrimination when the dependent variable is categorical and when a large number of categorical explanatory variables are retained. This method, Categorical Multiblock Linear Discriminant Analysis, computes components which take into account both relationships between explanatory categorical variables and canonical correlation between each explanatory categorical variable and the dependent variable. A comparison with three other techniques and an application on credit scoring data are provided. 相似文献

19.

An approximated distribution of the gini's rank associatio coefficient

G. Landenna A. Scagni M. Boldrini 《统计学通讯:理论与方法》2013,42(6):2017-2026

An approximate distribution is proposed for the Gini's rank association coefficient g which is, like Kendall's and Spearman's rank correlation coefficient, a statistic to test independence between two random variables. The purposed distribution can be simply transformed into a Student's T distribution; so, hypothesis testing is made much easier. 相似文献

20.

Peter Bühlmann Philipp Rütimann Sara van de Geer Cun-Hui Zhang 《Journal of statistical planning and inference》2013

We consider estimation in a high-dimensional linear model with strongly correlated variables. We propose to cluster the variables first and do subsequent sparse estimation such as the Lasso for cluster-representatives or the group Lasso based on the structure from the clusters. Regarding the first step, we present a novel and bottom-up agglomerative clustering algorithm based on canonical correlations, and we show that it finds an optimal solution and is statistically consistent. We also present some theoretical arguments that canonical correlation based clustering leads to a better-posed compatibility constant for the design matrix which ensures identifiability and an oracle inequality for the group Lasso. Furthermore, we discuss circumstances where cluster-representatives and using the Lasso as subsequent estimator leads to improved results for prediction and detection of variables. We complement the theoretical analysis with various empirical results. 相似文献