首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The multiple non symmetric correspondence analysis (MNSCA) is a useful technique for analyzing a two-way contingency table. In more complex cases, the predictor variables are more than one. In this paper, the MNSCA, along with the decomposition of the Gray–Williams Tau index, in main effects and interaction term, is used to analyze a contingency table with two predictor categorical variables and an ordinal response variable. The Multiple-Tau index is a measure of association that contains both main effects and interaction term. The main effects represent the change in the response variables due to the change in the level/categories of the predictor variables, considering the effects of their addition, while the interaction effect represents the combined effect of predictor categorical variables on the ordinal response variable. Moreover, for ordinal scale variables, we propose a further decomposition in order to check the existence of power components by using Emerson's orthogonal polynomials.  相似文献   

2.
Correspondence analysis (CA) has gained a reputation for being a very useful statistical technique for determining the nature of association between two or more categorical variables. For simple and multiple CA, the singular value decomposition (SVD) is the primary tool used and allows the user to construct a low-dimensional space to visualize this association. As an alternative to SVD, one may consider the bivariate moment decomposition (BMD), a method of decomposition that involves using orthogonal polynomials to reflect the structure of ordered categorical responses. When the features of BMD are combined with SVD, a hybrid decomposition (HD) is formed. The aim of this paper is to show the applicability of HD when performing simple and multiple CA.  相似文献   

3.
When one has information about a set of individuals on several variables, in different groups or contexts, and multivariate analysis is applied to each group the following questions arise: which groups show a similar response? how do the groups differ? how do the individuals differ in their responses in the different groups? These issues have led us to address a very interesting question in the practical context; the comparison and integration of the structures resulting from several multivariate analyses. Here we propose a method for the comparison and integration of the results arising from two Biplot analyses applied to the same variables in two different groups of individuals. By extension, we also develop the case of more than two Biplot analyses. Emphasis is placed on the underlying geometry and the interpretation of results, for which we offer indices that allow us to study the integrated structures and perform comparative analyses.  相似文献   

4.
In this article we develop an extension of categorical analysis of variance for one response and two factors, based on a partitioning of a measure of predictability for three-way contingency tables, known as Gray and Williams's index. At the first instance moment the decomposition of this multiple measure of association in partial association measures is shown. Finally, for ordinal-scale variables, we propose an extension of this decomposition using a particular set of orthogonal polynomials.  相似文献   

5.
We consider Bayesian testing for independence of two categorical variables with covariates for a two-stage cluster sample. This is a difficult problem because we have a complex sample (i.e. cluster sample), not a simple random sample. Our approach is to convert the cluster sample with covariates into an equivalent simple random sample without covariates, which provides a surrogate of the original sample. Then, this surrogate sample is used to compute the Bayes factor to make an inference about independence. We apply our methodology to the data from the Trend in International Mathematics and Science Study [30] for fourth grade US students to assess the association between the mathematics and science scores represented as categorical variables. We show that if there is strong association between two categorical variables, there is no significant difference between the tests with and without the covariates. We also performed a simulation study to further understand the effect of covariates in various situations. We found that for borderline cases (moderate association between the two categorical variables), there are noticeable differences in the test with and without covariates.  相似文献   

6.
In this paper some hierarchical methods for identifying groups of variables are illustrated and compared. It is shown that the use of multivariate association measures between two sets of variables can overcome the drawbacks of the usually employed bivariate correlation coefficient, but the resulting methods are generally not monotonic. Thus a new multivariate association measure is proposed, based on the links existing between canonical correlation analysis and principal component analysis, which can be more suitably used for the purpose at hand. The hierarchical method based on the suggested measure is illustrated and compared with other possible solutions by analysing simulated and real data sets. Finally an extension of the suggested method to the more general situation of mixed (qualitative and quantitative) variables is proposed and theoretically discussed.  相似文献   

7.
A multiple regression method based on distance analysis and metric scaling is proposed and studied. This method allow us to predict a continuous response variable from several explanatory variables, is compatible with the general linear model and is found to be useful when the predictor variables are both continuous and categorical. Real data examples are given to illustrate the results obtained.  相似文献   

8.
Cluster analysis is one of the most widely used method in statistical analyses, in which homogeneous subgroups are identified in a heterogeneous population. Due to the existence of the continuous and discrete mixed data in many applications, so far, some ordinary clustering methods such as, hierarchical methods, k-means and model-based methods have been extended for analysis of mixed data. However, in the available model-based clustering methods, by increasing the number of continuous variables, the number of parameters increases and identifying as well as fitting an appropriate model may be difficult. In this paper, to reduce the number of the parameters, for the model-based clustering mixed data of continuous (normal) and nominal data, a set of parsimonious models is introduced. Models in this set are extended, using the general location model approach, for modeling distribution of mixed variables and applying factor analyzer structure for covariance matrices. The ECM algorithm is used for estimating the parameters of these models. In order to show the performance of the proposed models for clustering, results from some simulation studies and analyzing two real data sets are presented.  相似文献   

9.
Taguchi's statistic has long been known to be a more appropriate measure of association of the dependence for ordinal variables compared to the Pearson chi-squared statistic. Therefore, there is some advantage in using Taguchi's statistic in the correspondence analysis context when a two-way contingency table consists at least of an ordinal categorical variable. The aim of this paper, considering the contingency table with two ordinal categorical variables, is to show a decomposition of Taguchi's index into linear, quadratic and higher-order components. This decomposition has been developed using Emerson's orthogonal polynomials. Moreover, two case studies to explain the methodology have been analyzed.  相似文献   

10.
The multiple non-symmetric correspondence analysis (MNSCA) is a useful technique for analysing the prediction of a categorical variable through two or more predictor variables placed in a contingency table. In MNSCA framework, for summarizing the predictability between criterion and predictor variables, the Multiple-TAU index has been proposed. But it cannot be used to test association, and for overcoming this limitation, a relationship with C-Statistic has been recommended. Multiple-TAU index is an overall measure of association that contains both main effects and interaction terms. The main effects represent the change in the response variables due to the change in the level/categories of the predictor variables, considering the effects of their addition. On the other hand, the interaction effect represents the combined effect of predictor variables on the response variable. In this paper, we propose a decomposition of the Multiple-TAU index in main effects and interaction terms. In order to show this decomposition, we consider an empirical case in which the relationship between the demographic characteristics of the American people, such as race, gender and location (column variables), and their propensity to move (row variable) to a new town to find a job is considered.  相似文献   

11.
《统计学通讯:理论与方法》2012,41(13-14):2342-2355
We propose a distance-based method to relate two data sets. We define and study some measures of multivariate association based on distances between observations. The proposed approach can be used to deal with general data sets (e.g., observations on continuous, categorical or mixed variables). An application, using Hellinger distance, provides the relationships between two regions of hyperspectral images.  相似文献   

12.
We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal component method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (multiple imputation using the loglinear model, multiple imputation by logistic regressions) as well to the latest works on the topic (multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimate of the variability of the estimators. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other multiple imputation methods.  相似文献   

13.
Techniques of credit scoring have been developed these last years in order to reduce the risk taken by banks and financial institutions in the loans that they are granting. Credit Scoring is a classification problem of individuals in one of the two following groups: defaulting borrowers or non-defaulting borrowers. The aim of this paper is to propose a new method of discrimination when the dependent variable is categorical and when a large number of categorical explanatory variables are retained. This method, Categorical Multiblock Linear Discriminant Analysis, computes components which take into account both relationships between explanatory categorical variables and canonical correlation between each explanatory categorical variable and the dependent variable. A comparison with three other techniques and an application on credit scoring data are provided.  相似文献   

14.
Independent factor analysis (IFA) has recently been proposed in the signal processing literature as a way to model a set of observed variables through linear combinations of latent independent variables and a noise term. A peculiarity of the method is that it defines a probability density function for the latent variables by mixtures of Gaussians. The aim of this paper is to cast the method into a more rigorous statistical framework and to propose some developments. In the first part, we present the IFA model in its population version, address identifiability issues and draw some parallels between the IFA model and the ordinary factor analysis (FA) one. Then we show that the IFA model may be reinterpreted as an independent component analysis-based rotation of an ordinary FA solution. We also give evidence that the IFA model represents a special case of mixture of factor analysers. In the second part, we address inferential issues, also deriving the standard errors for the model parameter estimates and providing model selection criteria. Finally, we present some empirical results on real data sets.  相似文献   

15.
A nonparametric inference algorithm developed by Davis and Geman (1983) is extended problem. The algorithm and applied to a medical prediction employs an estimation procedure for acquiring pairwise statistics among variables of a binary data set, allows for the data-driven creation of interaction terms among the variables, and employs a decision rule which asymptotically gives the minimum expected error. The inference procedure was designed for large data sets but has been extended via the method of cross-validation to encompass smaller data sets.  相似文献   

16.
This study proposes a methodological approach for extracting useful knowledge from survey data by performing Bayesian network (BN) modeling and adopting the robust coplot analysis results as prior knowledge about association patterns hidden in the data. By addressing the issue of BN construction when the expert knowledge is limited/not available, this proposed approach facilitates the modeling of large data sets describing numerously observed and latent variables. By answering the question of which node(s)/link(s) should be retained or discarded from a BN, we aim to determine a compact model of variables while considering the desired properties of data. The proposed method steps are explained on real data extracted from Turkey Demographic and Health Survey. First, a BN structure is created, which is based solely on the judgment of the analyst. Then the coplot results are employed to update the BN structure and the model parameters are updated using the updated structure and data. Loss scores of the BNs are used to ensure the success of the updated BN that inherits knowledge from coplot.  相似文献   

17.
ABSTRACT

We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the PCA model. Using a simulation study and real data sets, the method is compared to two classical approaches: multiple imputation based on joint modelling and on fully conditional modelling. Contrary to the others, the proposed method can be easily used on data sets where the number of individuals is less than the number of variables and when the variables are highly correlated. In addition, it provides unbiased point estimates of quantities of interest, such as an expectation, a regression coefficient or a correlation coefficient, with a smaller mean squared error. Furthermore, the widths of the confidence intervals built for the quantities of interest are often smaller whilst ensuring a valid coverage.  相似文献   

18.
Identification of influential genes and clinical covariates on the survival of patients is crucial because it can lead us to better understanding of underlying mechanism of diseases and better prediction models. Most of variable selection methods in penalized Cox models cannot deal properly with categorical variables such as gender and family history. The group lasso penalty can combine clinical and genomic covariates effectively. In this article, we introduce an optimization algorithm for Cox regression with group lasso penalty. We compare our method with other methods on simulated and real microarray data sets.  相似文献   

19.
Mixture separation for mixed-mode data   总被引:3,自引:0,他引:3  
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.  相似文献   

20.
It is known that patients may cease participating in a longitudinal study and become lost to follow-up. The objective of this article is to present a Bayesian model to estimate the malaria transition probabilities considering individuals lost to follow-up. We consider a homogeneous population, and it is assumed that the considered period of time is small enough to avoid two or more transitions from one state of health to another. The proposed model is based on a Gibbs sampling algorithm that uses information of lost to follow-up at the end of the longitudinal study. To simulate the unknown number of individuals with positive and negative states of malaria at the end of the study and lost to follow-up, two latent variables were introduced in the model. We used a real data set and a simulated data to illustrate the application of the methodology. The proposed model showed a good fit to these data sets, and the algorithm did not show problems of convergence or lack of identifiability. We conclude that the proposed model is a good alternative to estimate probabilities of transitions from one state of health to the other in studies with low adherence to follow-up.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号