期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Decomposition of the main effects and interaction term by using orthogonal polynomials in multiple non symmetrical correspondence analysis

Antonello D'Ambra Pietro Amenta Anna Crisci 《统计学通讯:理论与方法》2017,46(20):10179-10188

The multiple non symmetric correspondence analysis (MNSCA) is a useful technique for analyzing a two-way contingency table. In more complex cases, the predictor variables are more than one. In this paper, the MNSCA, along with the decomposition of the Gray–Williams Tau index, in main effects and interaction term, is used to analyze a contingency table with two predictor categorical variables and an ordinal response variable. The Multiple-Tau index is a measure of association that contains both main effects and interaction term. The main effects represent the change in the response variables due to the change in the level/categories of the predictor variables, considering the effects of their addition, while the interaction effect represents the combined effect of predictor categorical variables on the ordinal response variable. Moreover, for ordinal scale variables, we propose a further decomposition in order to check the existence of power components by using Emerson's orthogonal polynomials. 相似文献

2.

A new variable importance measure for random forests with missing data

Alexander Hapfelmeier Torsten Hothorn Kurt Ulm Carolin Strobl 《Statistics and Computing》2014,24(1):21-34

Random forests are widely used in many research fields for prediction and interpretation purposes. Their popularity is rooted in several appealing characteristics, such as their ability to deal with high dimensional data, complex interactions and correlations between variables. Another important feature is that random forests provide variable importance measures that can be used to identify the most important predictor variables. Though there are alternatives like complete case analysis and imputation, existing methods for the computation of such measures cannot be applied straightforward when the data contains missing values. This paper presents a solution to this pitfall by introducing a new variable importance measure that is applicable to any kind of data—whether it does or does not contain missing values. An extensive simulation study shows that the new measure meets sensible requirements and shows good variable ranking properties. An application to two real data sets also indicates that the new approach may provide a more sensible variable ranking than the widespread complete case analysis. It takes the occurrence of missing values into account which makes results also differ from those obtained under multiple imputation. 相似文献

3.

Calculation of multidimensional stable densities

John P. Nolan Balram Rajput 《统计学通讯:模拟与计算》2013,42(3):551-566

Stable random variables are used in economics, engineering, hydrology and physics to model situations where the underlying distributions are heavy tailed. Stable densities do not generally have explicit formula, even in one variable. This paper describes the steps necessary to calculate multivariate stable densities by numerically inverting the characteristic function. We give a program tocalculate two dimensional stable densites that uses a recent two dimensional adaptive quadratureroutine. Graphs of families of such densities are given for a range of values of a and various spectral measures 相似文献

4.

Cumulative correspondence analysis using orthogonal polynomials

Antonello D'Ambra 《统计学通讯:理论与方法》2017,46(6):2942-2954

Taguchi's statistic has long been known to be a more appropriate measure of association of the dependence for ordinal variables compared to the Pearson chi-squared statistic. Therefore, there is some advantage in using Taguchi's statistic in the correspondence analysis context when a two-way contingency table consists at least of an ordinal categorical variable. The aim of this paper, considering the contingency table with two ordinal categorical variables, is to show a decomposition of Taguchi's index into linear, quadratic and higher-order components. This decomposition has been developed using Emerson's orthogonal polynomials. Moreover, two case studies to explain the methodology have been analyzed. 相似文献

5.

Nonlinear measures of association with kernel canonical correlation analysis and applications

Su-Yun Huang Mei-Hsien Lee Chuhsing Kate Hsiao 《Journal of statistical planning and inference》2009

Measures of association between two sets of random variables have long been of interest to statisticians. The classical canonical correlation analysis (LCCA) can characterize, but also is limited to, linear association. This article introduces a nonlinear and nonparametric kernel method for association study and proposes a new independence test for two sets of variables. This nonlinear kernel canonical correlation analysis (KCCA) can also be applied to the nonlinear discriminant analysis. Implementation issues are discussed. We place the implementation of KCCA in the framework of classical LCCA via a sequence of independent systems in the kernel associated Hilbert spaces. Such a placement provides an easy way to carry out the KCCA. Numerical experiments and comparison with other nonparametric methods are presented. 相似文献

6.

Two-dimensional Renewal Function Approximation

Ehsan Moghimi Hadji Nirmal Singh Kambo Alagar Rangan 《统计学通讯:理论与方法》2013,42(15):3107-3124

Two-dimensional renewal functions, which are naturally extensions of one-dimensional renewal functions, have wide applicability in areas where two random variables are needed to characterize the underlying process. These functions satisfy the renewal equation, which is not amenable for analytical solutions. This paper proposes a simple approximation for the computation of the two- dimensional renewal function based only on the first two moments and the correlation coefficient of the variables. The approximation yields exact values of renewal function for bivariate exponential distribution function. Illustrations are presented to compare our approximation with that of Iskandar (1991) who provided a computational procedure which requires the use of the bivariate distribution function of the two variables. A two-dimensional warranty model is used to illustrate the approximation. 相似文献

7.

Interval Estimation for the Correlation Coefficient

Xinjie Hu Aekyung Jung 《The American statistician》2020,74(1):29-36

ABSTRACT

The correlation coefficient (CC) is a standard measure of a possible linear association between two continuous random variables. The CC plays a significant role in many scientific disciplines. For a bivariate normal distribution, there are many types of confidence intervals for the CC, such as z-transformation and maximum likelihood-based intervals. However, when the underlying bivariate distribution is unknown, the construction of confidence intervals for the CC is not well-developed. In this paper, we discuss various interval estimation methods for the CC. We propose a generalized confidence interval for the CC when the underlying bivariate distribution is a normal distribution, and two empirical likelihood-based intervals for the CC when the underlying bivariate distribution is unknown. We also conduct extensive simulation studies to compare the new intervals with existing intervals in terms of coverage probability and interval length. Finally, two real examples are used to demonstrate the application of the proposed methods. 相似文献

8.

Measures of Association and Visualization of Log Odds Ratio Structure for a Two Way Contingency Table

下载免费PDF全文

Pasquale Sarnacchiaro Luigi D'Ambra Ida Camminatiello 《Australian & New Zealand Journal of Statistics》2015,57(3):363-376

The odds ratio (OR) is a measure of association used for analysing an I × J contingency table. The total number of ORs to check grows with I and J. Several statistical methods have been developed for summarising them. These methods begin from two different starting points, the I × J contingency table and the two‐way table composed by the ORs. In this paper we focus our attention on the relationship between these methods and point out that, for an exhaustive analysis of association through log ORs, it is necessary to consider all the outcomes of these methods. We also introduce some new methodological and graphical features. In order to illustrate previously used methodologies, we consider a data table of the cross‐classification of the colour of eyes and hair of 5387 children from Scotland. We point out how, through the log OR analysis, it is possible to extract useful information about the association between variables. 相似文献

9.

The Robustness of Common Measures of 2×2 Association to Bias Due to Misclassifications

Helena Chmura Kraemer 《The American statistician》2013,67(4):286-290

The strength of association between two dichotomous characteristics A and B can be measured in many ways. All of these statistics are biased when there is misclassification, and all are prevalence dependent whether or not their population values are. Measures lacking fixed endpoints for random and perfect association, such as sensitivity, specificity, risk ratios, and odds ratio, have a bias either so unpredictable or so large that the observable and true measures of association may bear little resemblance to each other. Reexpressions of these measures that fix the endpoints and other measures with fixed endpoints, such as kappa, phi, gamma, risk difference, and attributable risk, produce attenuated estimates of their true values. Disattenuating such estimators is possible using test—retest data. 相似文献

10.

A table of some percentage points of the distribution of a difference between independent chi-square variables

《Journal of Statistical Computation and Simulation》2012,82(3-4):169-181

The distribution of Bell-Doksum measure of correlation is that of a difference between independent chi-square variables with equal weights. A table of percentage points computed here for the distribution may be used to test a hypothesis of no correlation between two variables. The distribution of a diffference between independent chi-square variables is also useful in studying variance component estimators and some general results corresponding to the distribution are given. 相似文献

11.

Hierarchical clustering of variables: a comparison among strategies of analysis

Gabriele Soffritti 《统计学通讯:模拟与计算》2013,42(4):977-999

In this paper some hierarchical methods for identifying groups of variables are illustrated and compared. It is shown that the use of multivariate association measures between two sets of variables can overcome the drawbacks of the usually employed bivariate correlation coefficient, but the resulting methods are generally not monotonic. Thus a new multivariate association measure is proposed, based on the links existing between canonical correlation analysis and principal component analysis, which can be more suitably used for the purpose at hand. The hierarchical method based on the suggested measure is illustrated and compared with other possible solutions by analysing simulated and real data sets. Finally an extension of the suggested method to the more general situation of mixed (qualitative and quantitative) variables is proposed and theoretically discussed. 相似文献

12.

A Multistage Chi-square Test for Measurement of the Degree of Association in Two-way Table

Jianping Zhu 《统计学通讯:模拟与计算》2016,45(4):1197-1212

The classical chi-square test for independence cannot convey additional information and the degree of the association of two factors in two-way contingency table. Besides, measures of association by contingency coefficient need to be used with care, because the association measures depend on the number of rows r and the number of columns c. This article proposes a multistage chi-square test to measure the degree of the association between the two factors in two-way contingency table. We also give simulation and real examples to assess the performance of the proposed method. The results show that our proposed method can effectively investigate the degree of association of two factors in two-way contingency table. 相似文献

13.

Simple correspondence analysis using adjusted residuals

Eric J. Beh 《Journal of statistical planning and inference》2012,142(4):965-973

Correspondence analysis is a versatile statistical technique that allows the user to graphically identify the association that may exist between variables of a contingency table. For two categorical variables, the classical approach involves applying singular value decomposition to the Pearson residuals of the table. These residuals allow for one to use a simple test to determine those cells that deviate from what is expected under independence. However, the assumptions concerning these residuals are not always satisfied and so such results can lead to questionable conclusions.One may consider instead, an adjustment of the Pearson residual, which is known to have properties associated with the standard normal distribution. This paper explores the application of these adjusted residuals to correspondence analysis and determines how they impact upon the configuration of points in the graphical display. 相似文献

14.

A generalized analysis of the dependence structure by means of ANOVA

Antonello D'Ambra Anna Crisci Pasquale Sarnacchiaro 《Journal of applied statistics》2015,42(10):2192-2202

The multiple non-symmetric correspondence analysis (MNSCA) is a useful technique for analysing the prediction of a categorical variable through two or more predictor variables placed in a contingency table. In MNSCA framework, for summarizing the predictability between criterion and predictor variables, the Multiple-TAU index has been proposed. But it cannot be used to test association, and for overcoming this limitation, a relationship with C-Statistic has been recommended. Multiple-TAU index is an overall measure of association that contains both main effects and interaction terms. The main effects represent the change in the response variables due to the change in the level/categories of the predictor variables, considering the effects of their addition. On the other hand, the interaction effect represents the combined effect of predictor variables on the response variable. In this paper, we propose a decomposition of the Multiple-TAU index in main effects and interaction terms. In order to show this decomposition, we consider an empirical case in which the relationship between the demographic characteristics of the American people, such as race, gender and location (column variables), and their propensity to move (row variable) to a new town to find a job is considered. 相似文献

15.

Correspondence Analysis of Cumulative Frequencies Using a Decomposition of Taguchi's Statistic

Eric J. Beh Luigi D'ambra Biagio Simonetti 《统计学通讯:理论与方法》2013,42(9):1620-1632

Taguchi's statistic has long been known to be a more appropriate measure of association for ordinal variables than the Pearson chi-squared statistic. Therefore, there is some advantage in using Taguchi's statistic for performing correspondence analysis when a two-way contingency table consists of one ordinal categorical variable. This article will explore the development of correspondence analysis using a decomposition of Taguchi's statistic. 相似文献

16.

The effect of category choice on the odds ratio and several measures of association in case-control studies

Thomas W. O'Gorman Robert F. Woolson 《统计学通讯:理论与方法》2013,42(4):1157-1171

In many case-control studies the risk factors are categorized in order to clarify the analysis and presentation of the data. However, inconsistent categorization of continuous risk factors may make interpretation difficult. This paper attempts to evaluate the effect of the categorization procedure on the odds ratio and several measures of association. Often the risk factor is dichotomized and the data linking the risk factor and the disease is presented in a 2 x 2 table. We show that the odds ratio obtained from the 2x2 table is usually considerably larger than the comparable statistic that would have been obtained had a large number of outpoints been used. Also, if 2 x 2, 2 x 3, or 2 x 4 tables are obtained by using a few outpoints on the risk factor, the measures of association for these tables are usually greater than the measure that would have been obtained had a large number of cntpoints been used. We propose an odds ratio measure that more closely approximates the odds ratio between the continuous risk factor and disease. A corresponding measure of association is also proposed for 2 x 2, 2x3, and 2x4 tables. 相似文献

17.

How far from identifiability? A systematic overview of the statistical matching problem in a non parametric framework

Pier Luigi Conti Mauro Scanu 《统计学通讯:理论与方法》2017,46(2):967-994

Statistical matching consists in estimating the joint characteristics of two variables observed in two distinct and independent sample surveys, respectively. In a parametric setup, ranges of estimates for non identifiable parameters are the only estimable items, unless restrictive assumptions on the probabilistic relationship between the non jointly observed variables are imposed. These ranges correspond to the uncertainty due to the absence of joint observations on the pair of variables of interest. The aim of this paper is to analyze the uncertainty in statistical matching in a non parametric setting. A measure of uncertainty is introduced, and its properties studied: this measure studies the “intrinsic” association between the pair of variables, which is constant and equal to 1/6 whatever the form of the marginal distribution functions of the two variables when knowledge on the pair of variables is the only one available in the two samples. This measure becomes useful in the context of the reduction of uncertainty due to further knowledge than data themselves, as in the case of structural zeros. In this case the proposed measure detects how the introduction of further knowledge shrinks the intrinsic uncertainty from 1/6 to smaller values, zero being the case of no uncertainty. Sampling properties of the uncertainty measure and of the bounds of the uncertainty intervals are also proved. 相似文献

18.

ON THE ANALYSIS AND APPLICATION OF MEASURES OF LINKAGE DISEQUILIBRIUM

Sing Kai Lo 《Australian & New Zealand Journal of Statistics》1991,33(3):249-259

The maximum likelihood, jackknife and bootstrap estimators of linkage disequilibrium, a measure of association in population genetics, are derived and compared. It is found that for point estimation, the resampling methods generate almost identical mean square errors. The maximum likelihood estimator could have bigger or smaller mean square errors depending on the parameters of the underlying population. However the bootstrap confidence interval is superior to the other two as the length of the intervals is shorter or the probability that the 95% confidence intervals include the true parameter is closer to 0.95. Although the standardised measure of linkage disequilibrium has a range from -1 to 1 regardless of marginal frequencies, it is shown that the distribution of this standardised measure is still not allele frequency independent under the multinomial sampling scheme. 相似文献

19.

Measuring association between nominal categorical variables: an alternative to the Goodman–Kruskal lambda

Tarald O. Kvålseth 《Journal of applied statistics》2018,45(6):1118-1132

As a measure of association between two nominal categorical variables, the lambda coefficient or Goodman–Kruskal's lambda has become a most popular measure. Its popularity is primarily due to its simple and meaningful definition and interpretation in terms of the proportional reduction in error when predicting a random observation's category for one variable given (versus not knowing) its category for the other variable. It is an asymmetric measure, although a symmetric version is available. The lambda coefficient does, however, have a widely recognized limitation: it can equal zero even when there is no independence between the variables and when all other measures take on positive values. In order to mitigate this problem, an alternative lambda coefficient is introduced in this paper as a slight modification of the Goodman–Kruskal lambda. The properties of the new measure are discussed and a symmetric form is introduced. A statistical inference procedure is developed and a numerical example is provided. 相似文献

20.

Estimating Archimedean Copulas in High Dimensions

CHRISTIAN HERING ULRICH STADTMÜLLER 《Scandinavian Journal of Statistics》2012,39(3):461-479

Abstract. This article presents a novel estimation procedure for high‐dimensional Archimedean copulas. In contrast to maximum likelihood estimation, the method presented here does not require derivatives of the Archimedean generator. This is computationally advantageous for high‐dimensional Archimedean copulas in which higher‐order derivatives are needed but are often difficult to obtain. Our procedure is based on a parameter‐dependent transformation of the underlying random variables to a one‐dimensional distribution where a minimum‐distance method is applied. We show strong consistency of the resulting minimum‐distance estimators to the case of known margins as well as to the case of unknown margins when pseudo‐observations are used. Moreover, we conduct a simulation comparing the performance of the proposed estimation procedure with the well‐known maximum likelihood approach according to bias and standard deviation. 相似文献