首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
We first compare correspondence analysis, which uses chi-square distance, and an alternative approach using Hellinger distance, for representing categorical data in a contingency table. We propose a coefficient which globally measures the similarity between these two approaches. This coefficient can be decomposed into several components, one component for each principal dimension, indicating the contribution of the dimensions to the difference between the two representations. We also make comparisons with the logratio approach based on compositional data. These three methods of representation can produce quite similar results. Two illustrative examples are given.  相似文献   

2.
属性数据的多变量对应分析模型及应用   总被引:1,自引:0,他引:1  
属性数据的建模和分析是市场研究活动中最重要的一类应用问题,学术界和市场研究行业一般使用列联表及统计检验的方法进行分析。故将法国统计学家J.P.Beozecri提出的对应分析法导入市场研究实证过程,使问卷调查中常见的多变量属性数据分析建模过程呈现出稳健的统计结论和直观的二维图示,并推广到广义多变量对应分析场合,其结论可靠,有应用创新。  相似文献   

3.
Over the past half a century correspondence analysis has grown from a little known statistical technique designed to graphically depict the association structure of categorical variables that form a contingency table to a very popular tool used in a wide variety of disciplines. Despite this growth, correspondence analysis remains relatively unknown in some parts of the world, including the Australasian statistical community. This paper provides a non‐technical, bibliographic exploration of correspondence analysis. We take a step back to view the development of this statistical technique and provide a brief account of its genealogy with a selection of over 270 key publications that have contributed to its growth. We also look at its maturity over the decades.  相似文献   

4.
Abstract. We review and extend some statistical tools that have proved useful for analysing functional data. Functional data analysis primarily is designed for the analysis of random trajectories and infinite‐dimensional data, and there exists a need for the development of adequate statistical estimation and inference techniques. While this field is in flux, some methods have proven useful. These include warping methods, functional principal component analysis, and conditioning under Gaussian assumptions for the case of sparse data. The latter is a recent development that may provide a bridge between functional and more classical longitudinal data analysis. Besides presenting a brief review of functional principal components and functional regression, we develop some concepts for estimating functional principal component scores in the sparse situation. An extension of the so‐called generalized functional linear model to the case of sparse longitudinal predictors is proposed. This extension includes functional binary regression models for longitudinal data and is illustrated with data on primary biliary cirrhosis.  相似文献   

5.
Principal component and correspondence analysis can both be used as exploratory methods for representing multivariate data in two dimensions. Circumstances under which the, possibly inappropriate, application of principal components to untransformed compositional data approximates to a correspondence analysis of the raw data are noted. Aitchison (1986) has proposed a method for the principal component analysis of compositional data involving transformation of the raw data. It is shown how this can be approximated by a correspondence analysis of appropriately transformed data. The latter approach may be preferable when there are zeroes in the data.  相似文献   

6.
In mining operation, effective maintenance scheduling is very important because of its effect on the performance of equipment and production costs. Classifying equipment on the basis of repair durations is considered one of the essential works to schedule maintenance activities effectively. In this study, repair data of electric cable shovels used in the Western Coal Company, Turkey, has been analyzed using correspondence analysis to classify shovels in terms of repair durations. Correspondence analysis, particularly helpful in analysing cross-tabular data in the form of numerical frequencies, has provided a graphical display that permitted more rapid interpretation and understanding of the repair data. The results indicated that there are five groups of shovels according to their repair duration. Especially, shovels numbered 2, 3, 7, 10 and 11 required a repair duration of<1 h and maintained relatively good service condition when compared with others. Thus, priority might be given to repair them in maintenance job scheduling even if there is another failed shovel waiting to be serviced. This type of information will help mine managers to increase the number of available shovels in operation.  相似文献   

7.
Functional logistic regression is becoming more popular as there are many situations where we are interested in the relation between functional covariates (as input) and a binary response (as output). Several approaches have been advocated, and this paper goes into detail about three of them: dimension reduction via functional principal component analysis, penalized functional regression, and wavelet expansions in combination with Least Absolute Shrinking and Selection Operator penalization. We discuss the performance of the three methods on simulated data and also apply the methods to data regarding lameness detection for horses. Emphasis is on classification performance, but we also discuss estimation of the unknown parameter function.  相似文献   

8.
Summary.  The problem of component choice in regression-based prediction has a long history. The main cases where important choices must be made are functional data analysis, and problems in which the explanatory variables are relatively high dimensional vectors. Indeed, principal component analysis has become the basis for methods for functional linear regression. In this context the number of components can also be interpreted as a smoothing parameter, and so the viewpoint is a little different from that for standard linear regression. However, arguments for and against conventional component choice methods are relevant to both settings and have received significant recent attention. We give a theoretical argument, which is applicable in a wide variety of settings, justifying the conventional approach. Although our result is of minimax type, it is not asymptotic in nature; it holds for each sample size. Motivated by the insight that is gained from this analysis, we give theoretical and numerical justification for cross-validation choice of the number of components that is used for prediction. In particular we show that cross-validation leads to asymptotic minimization of mean summed squared error, in settings which include functional data analysis.  相似文献   

9.
Data resulting from behavioral dental research, usually categorical or discretized and having unknown measurement and distributional characteristics, often cannot be analyzed with classical multivariate techniques. A non linear principal components technique called multiple correspondence analysis is presented with its corresponding computer program that can handle this kind of data. The model is described as a form of multidimensional scaling. The technique Is applied in order to establish which factors are associated with an Individual's preference for preservation of the teeth.  相似文献   

10.
In principal component analysis (PCA), it is crucial to know how many principal components (PCs) should be retained in order to account for most of the data variability. A class of “objective” rules for finding this quantity is the class of cross-validation (CV) methods. In this work we compare three CV techniques showing how the performance of these methods depends on the covariance matrix structure. Finally we propose a rule for the choice of the “best” CV method and give an application to real data.  相似文献   

11.
This empirical paper presents a number of functional modelling and forecasting methods for predicting very short-term (such as minute-by-minute) electricity demand. The proposed functional methods slice a seasonal univariate time series (TS) into a TS of curves; reduce the dimensionality of curves by applying functional principal component analysis before using a univariate TS forecasting method and regression techniques. As data points in the daily electricity demand are sequentially observed, a forecast updating method can greatly improve the accuracy of point forecasts. Moreover, we present a non-parametric bootstrap approach to construct and update prediction intervals, and compare the point and interval forecast accuracy with some naive benchmark methods. The proposed methods are illustrated by the half-hourly electricity demand from Monday to Sunday in South Australia.  相似文献   

12.
Taguchi's statistic has long been known to be a more appropriate measure of association for ordinal variables than the Pearson chi-squared statistic. Therefore, there is some advantage in using Taguchi's statistic for performing correspondence analysis when a two-way contingency table consists of one ordinal categorical variable. This article will explore the development of correspondence analysis using a decomposition of Taguchi's statistic.  相似文献   

13.
We consider the joint analysis of two matched matrices which have common rows and columns, for example multivariate data observed at two time points or split according to a dichotomous variable. Methods of interest include principal components analysis for interval-scaled data, correspondence analysis for frequency data, log-ratio analysis of compositional data and linear biplots in general, all of which depend on the singular value decomposition. A simple result in matrix algebra shows that by setting up two matched matrices in a particular block format, matrix sum and difference components can be analysed using a single application of the singular value decomposition algorithm. The methodology is applied to data from the International Social Survey Program comparing male and female attitudes on working wives across eight countries. The resulting biplots optimally display the overall cross-cultural differences as well as the male-female differences. The case of more than two matched matrices is also discussed.  相似文献   

14.
对应分析统计检验体系探讨   总被引:3,自引:0,他引:3  
对应分析因其结果的易读性,近些年得到了越来越广泛的应用。为了更好地应用对应分析,提出建立对应分析统计检验体系,包括对应分析适用性的统计检验以及对应分析效果的检验,同时还提出应用对应分析时应注意的其它问题。  相似文献   

15.
Graphical methods have played a central role in the development of statistical theory and practice. This presentation briefly reviews some of the highlights in the historical development of statistical graphics and gives a simple taxonomy that can be used to characterize the current use of graphical methods. This taxonomy is used to describe the evolution of the use of graphics in some major statistical and related scientific journals.

Some recent advances in the use of graphical methods for statistical analysis are reviewed, and several graphical methods for the statistical presentation of data are illustrated, including the use of multicolor maps.  相似文献   

16.
17.
We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal component method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (multiple imputation using the loglinear model, multiple imputation by logistic regressions) as well to the latest works on the topic (multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimate of the variability of the estimators. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other multiple imputation methods.  相似文献   

18.
This paper deals with the analysis of datasets, where the subjects are described by the estimated means of a p-dimensional variable. Classical statistical methods of data analysis do not treat measurements affected by intrinsic variability, as in the case of estimates, so that the heterogeneity induced among subjects by this condition is not taken into account. In this paper a way to solve the problem is suggested in the context of symbolic data analysis, whose specific aim is to handle data tables where single valued measurements are substituted by complex data structures like frequency distributions, intervals, and sets of values. A principal component analysis is carried out according to this proposal, with a significant improvement in the treatment of information.  相似文献   

19.
居民收入差距不断扩大是中国当前面临的重要社会经济问题,城乡二元结构、地区差距、行业垄断和非正常收入等因素,是导致中国基尼系数居高不下、收入差距不断扩大化的原因。基于主成分分析和回归分析等研究方法,对收入差距扩大化诱因的重要程度进行排序;通过分析得出生产要素中劳动价值被严重低估是导致收入差距扩大重要诱因的结论;提出缩小或控制中国收入差距的"一主两翼"组合的经济政策建议。  相似文献   

20.
In statistical practice, rectangular tables of numeric data are commonplace, and are often analyzed using dimension-reduction methods like the singular value decomposition and its close cousin, principal component analysis (PCA). This analysis produces score and loading matrices representing the rows and the columns of the original table and these matrices may be used for both prediction purposes and to gain structural understanding of the data. In some tables, the data entries are necessarily nonnegative (apart, perhaps, from some small random noise), and so the matrix factors meant to represent them should arguably also contain only nonnegative elements. This thinking, and the desire for parsimony, underlies such techniques as rotating factors in a search for “simple structure.” These attempts to transform score or loading matrices of mixed sign into nonnegative, parsimonious forms are, however, indirect and at best imperfect. The recent development of nonnegative matrix factorization, or NMF, is an attractive alternative. Rather than attempt to transform a loading or score matrix of mixed signs into one with only nonnegative elements, it directly seeks matrix factors containing only nonnegative elements. The resulting factorization often leads to substantial improvements in interpretability of the factors. We illustrate this potential by synthetic examples and a real dataset. The question of exactly when NMF is effective is not fully resolved, but some indicators of its domain of success are given. It is pointed out that the NMF factors can be used in much the same way as those coming from PCA for such tasks as ordination, clustering, and prediction. Supplementary materials for this article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号