首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Taguchi's statistic has long been known to be a more appropriate measure of association for ordinal variables than the Pearson chi-squared statistic. Therefore, there is some advantage in using Taguchi's statistic for performing correspondence analysis when a two-way contingency table consists of one ordinal categorical variable. This article will explore the development of correspondence analysis using a decomposition of Taguchi's statistic.  相似文献   

2.
Taguchi's statistic has long been known to be a more appropriate measure of association of the dependence for ordinal variables compared to the Pearson chi-squared statistic. Therefore, there is some advantage in using Taguchi's statistic in the correspondence analysis context when a two-way contingency table consists at least of an ordinal categorical variable. The aim of this paper, considering the contingency table with two ordinal categorical variables, is to show a decomposition of Taguchi's index into linear, quadratic and higher-order components. This decomposition has been developed using Emerson's orthogonal polynomials. Moreover, two case studies to explain the methodology have been analyzed.  相似文献   

3.
Over the past half a century correspondence analysis has grown from a little known statistical technique designed to graphically depict the association structure of categorical variables that form a contingency table to a very popular tool used in a wide variety of disciplines. Despite this growth, correspondence analysis remains relatively unknown in some parts of the world, including the Australasian statistical community. This paper provides a non‐technical, bibliographic exploration of correspondence analysis. We take a step back to view the development of this statistical technique and provide a brief account of its genealogy with a selection of over 270 key publications that have contributed to its growth. We also look at its maturity over the decades.  相似文献   

4.
The multiple non-symmetric correspondence analysis (MNSCA) is a useful technique for analysing the prediction of a categorical variable through two or more predictor variables placed in a contingency table. In MNSCA framework, for summarizing the predictability between criterion and predictor variables, the Multiple-TAU index has been proposed. But it cannot be used to test association, and for overcoming this limitation, a relationship with C-Statistic has been recommended. Multiple-TAU index is an overall measure of association that contains both main effects and interaction terms. The main effects represent the change in the response variables due to the change in the level/categories of the predictor variables, considering the effects of their addition. On the other hand, the interaction effect represents the combined effect of predictor variables on the response variable. In this paper, we propose a decomposition of the Multiple-TAU index in main effects and interaction terms. In order to show this decomposition, we consider an empirical case in which the relationship between the demographic characteristics of the American people, such as race, gender and location (column variables), and their propensity to move (row variable) to a new town to find a job is considered.  相似文献   

5.
The multiple non symmetric correspondence analysis (MNSCA) is a useful technique for analyzing a two-way contingency table. In more complex cases, the predictor variables are more than one. In this paper, the MNSCA, along with the decomposition of the Gray–Williams Tau index, in main effects and interaction term, is used to analyze a contingency table with two predictor categorical variables and an ordinal response variable. The Multiple-Tau index is a measure of association that contains both main effects and interaction term. The main effects represent the change in the response variables due to the change in the level/categories of the predictor variables, considering the effects of their addition, while the interaction effect represents the combined effect of predictor categorical variables on the ordinal response variable. Moreover, for ordinal scale variables, we propose a further decomposition in order to check the existence of power components by using Emerson's orthogonal polynomials.  相似文献   

6.
Correspondence analysis is a versatile statistical technique that allows the user to graphically identify the association that may exist between variables of a contingency table. For two categorical variables, the classical approach involves applying singular value decomposition to the Pearson residuals of the table. These residuals allow for one to use a simple test to determine those cells that deviate from what is expected under independence. However, the assumptions concerning these residuals are not always satisfied and so such results can lead to questionable conclusions.One may consider instead, an adjustment of the Pearson residual, which is known to have properties associated with the standard normal distribution. This paper explores the application of these adjusted residuals to correspondence analysis and determines how they impact upon the configuration of points in the graphical display.  相似文献   

7.
It is well known that the Pearson statistic \(\chi ^{2}\) can perform poorly in studying the association between ordinal categorical variables. Taguchi’s and Hirotsu’s statistics have been introduced in the literature as simple alternatives to Pearson’s chi-squared test for contingency tables with ordered categorical variables. The aim of this paper is to shed new light on these statistics, stressing their interpretations and characteristics, providing in this way new and different interpretations of these statistics. Moreover, a theoretical scheme is developed showing the links between the different proposals and classes of cumulative chi-squared statistical tests, starting from a unifying index of heterogeneity, unalikeability and variability measures. Users of statistics may find it attractive to understand well the different proposals. Some decompositions of both statistics are also highlighted. This paper presents a case study of optimizing the polysilicon deposition process in a very large-scale integrated circuit, to identify the optimal combination of factor levels. It is obtained by means of the information coming from a correspondence analysis based on Taguchi’s statistic and regression models for binary dependent variables. A new optimal combination of factor levels is obtained, different from many others proposed in the literature for this data.  相似文献   

8.
The perception of food in Europe has been a topic of research for many years due to its importance in better understanding the role of food in helping to define the culture of a country. It is also important from a marketing perspective for identifying how consumers relate to food. Recently, this topic was discussed by Guerrero et al. (2010) who used a graphical statistical technique called correspondence analysis to identify the association between the countries that participated in the study and words that were linked with “Traditional” food. This paper explores the use of non-symmetrical correspondence analysis and provides an interpretation of the configuration of points in the graphical display in terms of its first four moments. In particular, we will focus on the skewness and kurtosis of such a configuration. Such measure's provide further detail on the nature of the association between the countries studied and the words linked with “Traditional” food.  相似文献   

9.
This paper describes a proposal for the extension of the dual multiple factor analysis (DMFA) method developed by Lê and Pagès 15 to the analysis of categorical tables in which the same set of variables is measured on different sets of individuals. The extension of DMFA is based on the transformation of categorical variables into properly weighted indicator variables, in a way analogous to that used in the multiple factor analysis of categorical variables. The DMFA of categorical variables enables visual comparison of the association structures between categories over the sample as a whole and in the various subsamples (sets of individuals). For each category, DMFA allows us to obtain its global (considering all the individuals) and partial (considering each set of individuals) coordinates in a factor space. This visual analysis allows us to compare the set of individuals to identify their similarities and differences. The suitability of the technique is illustrated through two applications: one using simulated data for two groups of individuals with very different association structures and the other using real data from a voting intention survey in which some respondents were interviewed by telephone and others face to face. The results indicate that the two data collection methods, while similar, are not entirely equivalent.  相似文献   

10.
Correspondence analysis (CA) has gained a reputation for being a very useful statistical technique for determining the nature of association between two or more categorical variables. For simple and multiple CA, the singular value decomposition (SVD) is the primary tool used and allows the user to construct a low-dimensional space to visualize this association. As an alternative to SVD, one may consider the bivariate moment decomposition (BMD), a method of decomposition that involves using orthogonal polynomials to reflect the structure of ordered categorical responses. When the features of BMD are combined with SVD, a hybrid decomposition (HD) is formed. The aim of this paper is to show the applicability of HD when performing simple and multiple CA.  相似文献   

11.
We consider Bayesian testing for independence of two categorical variables with covariates for a two-stage cluster sample. This is a difficult problem because we have a complex sample (i.e. cluster sample), not a simple random sample. Our approach is to convert the cluster sample with covariates into an equivalent simple random sample without covariates, which provides a surrogate of the original sample. Then, this surrogate sample is used to compute the Bayes factor to make an inference about independence. We apply our methodology to the data from the Trend in International Mathematics and Science Study [30] for fourth grade US students to assess the association between the mathematics and science scores represented as categorical variables. We show that if there is strong association between two categorical variables, there is no significant difference between the tests with and without the covariates. We also performed a simulation study to further understand the effect of covariates in various situations. We found that for borderline cases (moderate association between the two categorical variables), there are noticeable differences in the test with and without covariates.  相似文献   

12.
Model-based clustering methods for continuous data are well established and commonly used in a wide range of applications. However, model-based clustering methods for categorical data are less standard. Latent class analysis is a commonly used method for model-based clustering of binary data and/or categorical data, but due to an assumed local independence structure there may not be a correspondence between the estimated latent classes and groups in the population of interest. The mixture of latent trait analyzers model extends latent class analysis by assuming a model for the categorical response variables that depends on both a categorical latent class and a continuous latent trait variable; the discrete latent class accommodates group structure and the continuous latent trait accommodates dependence within these groups. Fitting the mixture of latent trait analyzers model is potentially difficult because the likelihood function involves an integral that cannot be evaluated analytically. We develop a variational approach for fitting the mixture of latent trait models and this provides an efficient model fitting strategy. The mixture of latent trait analyzers model is demonstrated on the analysis of data from the National Long Term Care Survey (NLTCS) and voting in the U.S. Congress. The model is shown to yield intuitive clustering results and it gives a much better fit than either latent class analysis or latent trait analysis alone.  相似文献   

13.
In this article we develop an extension of categorical analysis of variance for one response and two factors, based on a partitioning of a measure of predictability for three-way contingency tables, known as Gray and Williams's index. At the first instance moment the decomposition of this multiple measure of association in partial association measures is shown. Finally, for ordinal-scale variables, we propose an extension of this decomposition using a particular set of orthogonal polynomials.  相似文献   

14.
This note discusses a problem that might occur when forward stepwise regression is used for variable selection and among the candidate variables is a categorical variable with more than two categories. Most software packages (such as SAS, SPSSx, BMDP) include special programs for performing stepwise regression. The user of these programs has to code categorical variables with dummy variables. In this case the forward selection might wrongly indicate that a categorical variable with more than two categories is nonsignificant. This is a disadvantage of the forward selection compared with the backward elimination method. A way to avoid the problem would be to test in a single step all dummy variables corresponding to the same categorical variable rather than one dummy variable at a time, such as in the analysis of covariance. This option, however, is not available in forward stepwise procedures, except for stepwise logistic regression in BMDP. A practical possibility is to repeat the forward stepwise regression and change the reference categories each time.  相似文献   

15.
While at least some standard graphical tools do exist for cardinal time series analysis, little research effort has been given directed towards the visualization of categorical time series. The repertoire of such visual methods is nearly exclusively restricted to few isolated proposals from computer science and biology. This article aims at presenting a toolbox of known and newly developed approaches for analysing given categorical time series data visually. Among these tools, especially the rate evolution graph, the circle transformation, pattern histograms and control charts are promising.  相似文献   

16.
Mixture separation for mixed-mode data   总被引:3,自引:0,他引:3  
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.  相似文献   

17.
Parameter estimation for association and log-linear models is an important aspect of the analysis of cross-classified categorical data. Classically, iterative procedures, including Newton's method and iterative scaling, have typically been used to calculate the maximum likelihood estimates of these parameters. An important special case occurs when the categorical variables are ordinal and this has received a considerable amount of attention for more than 20 years. This is because models for such cases involve the estimation of a parameter that quantifies the linear-by-linear association and is directly linked with the natural logarithm of the common odds ratio. The past five years has seen the development of non-iterative procedures for estimating the linear-by-linear parameter for ordinal log-linear models. Such procedures have been shown to lead to numerically equivalent estimates when compared with iterative, maximum likelihood estimates. Such procedures also enable the researcher to avoid some of the computational difficulties that commonly arise with iterative algorithms. This paper investigates and evaluates the performance of three non-iterative procedures for estimating this parameter by considering 14 contingency tables that have appeared in the statistical and allied literature. The estimation of the standard error of the association parameter is also considered.  相似文献   

18.
The analysis of time-indexed categorical data is important in many fields, e.g., in telecommunication network monitoring, manufacturing process control, ecology, etc. Primary interest is in detecting and measuring serial associations and dependencies in such data. For cardinal time series analysis, autocorrelation is a convenient and informative measure of serial association. Yet, for categorical time series analysis an analogous convenient measure and corresponding concepts of weak stationarity have not been provided. For two categorical variables, several ways of measuring association have been suggested. This paper reviews such measures and investigates their properties in a serial context. We discuss concepts of weak stationarity of a categorical time series, in particular of stationarity in association measures. Serial association and weak stationarity are studied in the class of discrete ARMA processes introduced by Jacobs and Lewis (J. Time Ser. Anal. 4(1):19–36, 1983). An intrinsic feature of a time series is that, typically, adjacent observations are dependent. The nature of this dependence among observations of a time series is of considerable practical interest. Time series analysis is concerned with techniques for the analysis of this dependence. (Box et al. 1994p. 1)  相似文献   

19.
To explore the application value of correspondence analysis in oncology, we adopted correspondence analysis method to analyze the relationship between the amount of food eaten in some cities in China and the male gastric carcinoma mortality. According to scatter plots of row and column points, there are regional differences among the male gastric carcinoma mortality in different cities of China. Southern male citizens ate more rice and salt, less wheaten food, and fewer light vegetables than northern ones. There may be some carcinogenic factors in some food.  相似文献   

20.
《统计学通讯:理论与方法》2012,41(13-14):2342-2355
We propose a distance-based method to relate two data sets. We define and study some measures of multivariate association based on distances between observations. The proposed approach can be used to deal with general data sets (e.g., observations on continuous, categorical or mixed variables). An application, using Hellinger distance, provides the relationships between two regions of hyperspectral images.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号