首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 962 毫秒
Compositional tables represent a continuous counterpart to well-known contingency tables. Their cells contain quantitatively expressed relative contributions of a whole, carrying exclusively relative information and are popularly represented in proportions or percentages. The resulting factors, corresponding to rows and columns of the table, can be inspected similarly as with contingency tables, e.g. for their mutual independent behaviour. The nature of compositional tables requires a specific geometrical treatment, represented by the Aitchison geometry on the simplex. The properties of the Aitchison geometry allow a decomposition of the original table into its independent and interactive parts. Moreover, the specific case of 2×2 compositional tables allows the construction of easily interpretable orthonormal coordinates (resulting from the isometric logratio transformation) for the original table and its decompositions. Consequently, for a sample of compositional tables both explorative statistical analysis like graphical inspection of the independent and interactive parts or any statistical inference (odds-ratio-like testing of independence) can be performed. Theoretical advancements of the presented approach are demonstrated using two economic applications.  相似文献   

The logratio methodology is not applicable when rounded zeros occur in compositional data. There are many methods to deal with rounded zeros. However, some methods are not suitable for analyzing data sets with high dimensionality. Recently, related methods have been developed, but they cannot balance the calculation time and accuracy. For further improvement, we propose a method based on regression imputation with Q-mode clustering. This method forms the groups of parts and builds partial least squares regression with these groups using centered logratio coordinates. We also prove that using centered logratio coordinates or isometric logratio coordinates in the response of partial least squares regression have the equivalent results for the replacement of rounded zeros. Simulation study and real example are conducted to analyze the performance of the proposed method. The results show that the proposed method can reduce the calculation time in higher dimensions and improve the quality of results.  相似文献   

Biplots represent a widely used statistical tool for visualizing the resulting loadings and scores of a dimension reduction technique applied to multivariate data. If the underlying data carry only relative information (i.e. compositional data expressed in proportions, mg/kg, etc.) they have to be pre-processed with a logratio transformation before the dimension reduction is carried out. In the context of principal component analysis, the resulting biplot is called compositional biplot. We introduce an alternative, the ilr biplot, which is based on a special choice of orthonormal coordinates resulting from an isometric logratio (ilr) transformation. This allows to incorporate also external non-compositional variables, and to study the relations to the compositional variables. The methodology is demonstrated on real data sets.  相似文献   

The different parts (variables) of a compositional data set cannot be considered independent from each other, since only the ratios between the parts constitute the relevant information to be analysed. Practically, this information can be included in a system of orthonormal coordinates. For the task of regression of one part on other parts, a specific choice of orthonormal coordinates is proposed which allows for an interpretation of the regression parameters in terms of the original parts. In this context, orthogonal regression is appropriate since all compositional parts – also the explanatory variables – are measured with errors. Besides classical (least-squares based) parameter estimation, also robust estimation based on robust principal component analysis is employed. Statistical inference for the regression parameters is obtained by bootstrap; in the robust version the fast and robust bootstrap procedure is used. The methodology is illustrated with a data set from macroeconomics.  相似文献   

Compositional tables – a continuous counterpart to the contingency tables – carry relative information about relationships between row and column factors; thus, for their analysis, only ratios between cells of a table are informative. Consequently, the standard Euclidean geometry should be replaced by the Aitchison geometry on the simplex that enables decomposition of the table into its independent and interactive parts. The aim of the paper is to find interpretable coordinate representation for independent and interaction tables (in sense of balances and odds ratios of cells, respectively), where further statistical processing of compositional tables can be performed. Theoretical results are applied to real‐world problems from a health survey and in macroeconomics.  相似文献   

We first compare correspondence analysis, which uses chi-square distance, and an alternative approach using Hellinger distance, for representing categorical data in a contingency table. We propose a coefficient which globally measures the similarity between these two approaches. This coefficient can be decomposed into several components, one component for each principal dimension, indicating the contribution of the dimensions to the difference between the two representations. We also make comparisons with the logratio approach based on compositional data. These three methods of representation can produce quite similar results. Two illustrative examples are given.  相似文献   

When tables are generated from a data file, the release of those tables should not reveal too detailed information concerning individual respondents. The disclosure of individual respondents in the microdata file can be prevented by applying disclosure control methods at the table level (by cell suppression or cell perturbation), but this may create inconsistencies among other tables based on the same data file. Alternatively, disclosure control methods can be applied at the microdata level, but these methods may change the data permanently and do not account for specific table properties. These problems can be circumvented by assigning a (single and fixed) weight factor to each respondent/record in the microdata file. Normally this weight factor is equal to 1 for each record, and is not explicitly incorporated in the microdata file. Upon tabulation, each contribution of a respondent is weighted multiplicatively by the respondent's weight factor. This approach is called Source Data Perturbation (SDP) because the data is perturbed at the microdata level, not at the table level. It should be noted, however, that the data in the original microdata is not changed; only a weight variable is added. The weight factors can be chosen in accordance with the SDC paradigm, i.e. such that the tables generated from the microdata are safe, and the information loss is minimized. The paper indicates how this can be done. Moreover it is shown that the SDP approach is very suitable for use in data warehouses, as the weights can be conveniently put in the fact tables. The data can then still be accessed and sliced and diced up to a certain level of detail, and tables generated from the data warehouse are mutually consistent and safe.  相似文献   

This paper is concerned wim ine maximum likelihood estimation and the likelihood ratio test for hierarchical loglinear models of multidimensional contingency tables with missing data. The problems of estimation and test for a high dimensional contingency table can be reduced into those for a class of low dimensional tables. In some cases, the incomplete data in the high dimensional table can become complete in the low dimensional tables through the reduction can indicate how much the incomplete data contribute to the estimation and the test.  相似文献   

Compositional data are characterized by values containing relative information, and thus the ratios between the data values are of interest for the analysis. Due to specific features of compositional data, standard statistical methods should be applied to compositions expressed in a proper coordinate system with respect to an orthonormal basis. It is discussed how three-way compositional data can be analyzed with the Parafac model. When data are contaminated by outliers, robust estimates for the Parafac model parameters should be employed. It is demonstrated how robust estimation can be done in the context of compositional data and how the results can be interpreted. A real data example from macroeconomics underlines the usefulness of this approach.  相似文献   

Analysis of a large dimensional contingency table is quite involved. Models corresponding to layers of a contingency table are easier to analyze than the full model. Relationships between the interaction parameters of the full log-linear model and that of its corresponding layer models are obtained. These relationships are not only useful to reduce the analysis but also useful to interpret various hierarchical models. We obtain these relationships for layers of one variable, and extend the results for the case when layers of more than one variable are considered. We also establish, under conditional independence, relationships between the interaction parameters of the full model and that of the corresponding marginal models. We discuss the concept of merging of factor levels based on these interaction parameters. Finally, we use the relationships between layer models and full model to obtain conditions for level merging based on layer interaction parameters. Several examples are discussed to illustrate the results.  相似文献   

Methods to perform regression on compositional covariates have recently been proposed using isometric log-ratios (ilr) representation of compositional parts. This approach consists of first applying standard regression on ilr coordinates and second, transforming the estimated ilr coefficients into their contrast log-ratio counterparts. This gives easy-to-interpret parameters indicating the relative effect of each compositional part. In this work we present an extension of this framework, where compositional covariate effects are allowed to be smooth in the ilr domain. This is achieved by fitting a smooth function over the multidimensional ilr space, using Bayesian P-splines. Smoothness is achieved by assuming random walk priors on spline coefficients in a hierarchical Bayesian framework. The proposed methodology is applied to spatial data from an ecological survey on a gypsum outcrop located in the Emilia Romagna Region, Italy.  相似文献   

基于最新的全国42部门投入产出表,创新性地编制出2007年全国能源投入产出表、全国绿色能源投入产出表,并在此基础上通过分析比较两表间的影响力系数、感应度系数及能源部门环境成本弹性系数,结果表明:煤炭开采业、石油开采业、火电业和炼焦业对其他部门的制约程度更大;石油开采业、石油加工业、火电业和炼焦业对其他生产部门的“拉动”作用明显下降;天然气开采业的感应度系数和影响力系数有明显改善;煤炭开采业、石油开采业、火电业和石油加工业的生产成本受环境治理成本变化影响较大.鉴此,提出中国“十二五”期间能源产业结构调整的有关政策建议.  相似文献   

A general methodology is presented for finding suitable Poisson log-linear models with applications to multiway contingency tables. Mixtures of multivariate normal distributions are used to model prior opinion when a subset of the regression vector is believed to be nonzero. This prior distribution is studied for two- and three-way contingency tables, in which the regression coefficients are interpretable in terms of odds ratios in the table. Efficient and accurate schemes are proposed for calculating the posterior model probabilities. The methods are illustrated for a large number of two-way simulated tables and for two three-way tables. These methods appear to be useful in selecting the best log-linear model and in estimating parameters of interest that reflect uncertainty in the true model.  相似文献   

To assess independence in two-way contingency tables, the Pearson chi-square test or Fisher’s exact test are typically used. These tests assume that each subject contributes at most one count to only one table cell (e.g., sex versus blood type). In other situations, each subject may have more than one count contributing to the table and these counts may occur in different cells of the table. One may wish to test independence, adjusting for the within-subject correlation. We provide a simple nonparametric bootstrap approach and assess its performance through simulation studies. The method is illustrated on subjects with multiple mental health presentations to Emergency Departments.  相似文献   

In many case-control studies the risk factors are categorized in order to clarify the analysis and presentation of the data. However, inconsistent categorization of continuous risk factors may make interpretation difficult. This paper attempts to evaluate the effect of the categorization procedure on the odds ratio and several measures of association. Often the risk factor is dichotomized and the data linking the risk factor and the disease is presented in a 2 x 2 table. We show that the odds ratio obtained from the 2x2 table is usually considerably larger than the comparable statistic that would have been obtained had a large number of outpoints been used. Also, if 2 x 2, 2 x 3, or 2 x 4 tables are obtained by using a few outpoints on the risk factor, the measures of association for these tables are usually greater than the measure that would have been obtained had a large number of cntpoints been used. We propose an odds ratio measure that more closely approximates the odds ratio between the continuous risk factor and disease. A corresponding measure of association is also proposed for 2 x 2, 2x3, and 2x4 tables.  相似文献   

Popular rank-2 and rank-3 models for two-way tables have geometrical properties which can be used as diagnostic keys in screening for an appropriate model. Row and column levels of two-way tables are represented by points in two or three dimensional space, whereupon collinearity and coplanarity of row and column points provide diagnostic keys for informal model choice. Coordinates are obtained from a factorization of the two-way table Y in the matrix product UV T. The rows of U then contain row-point coordinates and the rows of V column-point coordinates. Illustrations of applications of diagnostic biplots in the literature were restricted to data from chemistry and physics with little or no noise. In plant breeding, two-way tables containing substantial amounts of noise regularly arise in the form of genotype by environment tables. To investigate the usefulness of diagnostic biplots for model screening for genotype by environment tables, data tables were generated from a range of two-way models under the addition of various amounts of noise. Chances for correct diagnosis of the generating model depended on the type of model. Diagnostic biplots on their own do not seem to provide a sufficient means for model selection for genotype by environment tables, but in combination with other methods they certainly can provide extra insight into the structure of the data.  相似文献   

完全关联分析通常以Leontief逆矩阵为基础,而Leontief逆矩阵的计算又以竞争型投入产出表为数据来源。经推导发现:以竞争型投入产出表为数据来源计算Leontief逆矩阵暗含假设条件,国内产业的直接消耗系数矩阵等于国外产业的直接消耗系数矩阵。根据OECD提供的投入产出数据验证该假设条件,其结论是该假设条件不能满足,故在完全关联分析中要以非竞争型投入产出表为数据来源,因为非竞争型投入产出表区分了国产品与进口品的中间使用和最终使用,而且以非竞争型投入产出表为数据来源进行完全关联分析不受上述假设条件的限制。  相似文献   

For square contingency tables with ordered categories, there may be some cases that one wants to analyze them by considering collapsed tables with some adjacent categories combined in the original table. This paper considers the symmetry model for collapsed square contingency tables and proposes a measure to represent the degree of departure from symmetry. The proposed measure is defined as the arithmetic mean of submeasures each of which represents the degree of departure from symmetry for each collapsed 3×3 table. Each submeasure also represents the mean of power-divergence or diversity index for each collapsed table. Examples are given.  相似文献   

Testing for the difference in the strength of bivariate association in two independent contingency tables is an important issue that finds applications in various disciplines. Currently, many of the commonly used tests are based on single-index measures of association. More specifically, one obtains single-index measurements of association from two tables and compares them based on asymptotic theory. Although they are usually easy to understand and use, often much of the information contained in the data is lost with single-index measures. Accordingly, they fail to fully capture the association in the data. To remedy this shortcoming, we introduce a new summary statistic measuring various types of association in a contingency table. Based on this new summary statistic, we propose a likelihood ratio test comparing the strength of association in two independent contingency tables. The proposed test examines the stochastic order between summary statistics. We derive its asymptotic null distribution and demonstrate that the least favorable distributions are chi-bar distributions. We numerically compare the power of the proposed test to that of the tests based on single-index measures. Finally, we provide two examples illustrating the new summary statistics and the related tests.  相似文献   

For the exploratory analysis of three-way data, the Tucker3 is one of the most applied models to study three-way arrays when the data are quadrilinear. When the data consist of vectors of positive values summing to a unit, as in the case of compositional data, this model should consider the specific problems that compositional data analysis brings. The main purpose of this paper is to describe how to do a Tucker3 analysis of compositional data, and to show the relationships between the loading matrices when different preprocessing procedures are used.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号