期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Dealing with big data: comparing dimension reduction and shrinkage regression methods

Hamideh D. Hamedani Sara Sadat Moosavi 《Journal of applied statistics》2017,44(3):511-532

In the past decades, the number of variables explaining observations in different practical applications increased gradually. This has led to heavy computational tasks, despite of widely using provisional variable selection methods in data processing. Therefore, more methodological techniques have appeared to reduce the number of explanatory variables without losing much of the information. In these techniques, two distinct approaches are apparent: ‘shrinkage regression’ and ‘sufficient dimension reduction’. Surprisingly, there has not been any communication or comparison between these two methodological categories, and it is not clear when each of these two approaches are appropriate. In this paper, we fill some of this gap by first reviewing each category in brief, paying special attention to the most commonly used methods in each category. We then compare commonly used methods from both categories based on their accuracy, computation time, and their ability to select effective variables. A simulation study on the performance of the methods in each category is generated as well. The selected methods are concurrently tested on two sets of real data which allows us to recommend conditions under which one approach is more appropriate to be applied to high-dimensional data. 相似文献

2.

Extending dual multiple factor analysis to categorical tables

Elena Abascal Vidal Díaz de Rada M. Isabel Landaluce 《Journal of applied statistics》2013,40(2):415-428

This paper describes a proposal for the extension of the dual multiple factor analysis (DMFA) method developed by Lê and Pagès 15 to the analysis of categorical tables in which the same set of variables is measured on different sets of individuals. The extension of DMFA is based on the transformation of categorical variables into properly weighted indicator variables, in a way analogous to that used in the multiple factor analysis of categorical variables. The DMFA of categorical variables enables visual comparison of the association structures between categories over the sample as a whole and in the various subsamples (sets of individuals). For each category, DMFA allows us to obtain its global (considering all the individuals) and partial (considering each set of individuals) coordinates in a factor space. This visual analysis allows us to compare the set of individuals to identify their similarities and differences. The suitability of the technique is illustrated through two applications: one using simulated data for two groups of individuals with very different association structures and the other using real data from a voting intention survey in which some respondents were interviewed by telephone and others face to face. The results indicate that the two data collection methods, while similar, are not entirely equivalent. 相似文献

3.

Joint models for mixed categorical outcomes: a study of HIV risk perception and disease status in Mozambique

Osvaldo Loquiha Niel Hens Emilia Martins-Fonteyn Herman Meulemans Edwin Wouters Marleen Temmerman 《Journal of applied statistics》2018,45(10):1781-1798

Two types of bivariate models for categorical response variables are introduced to deal with special categories such as ‘unsure’ or ‘unknown’ in combination with other ordinal categories, while taking additional hierarchical data structures into account. The latter is achieved by the use of different covariance structures for a trivariate random effect. The models are applied to data from the INSIDA survey, where interest goes to the effect of covariates on the association between HIV risk perception (quadrinomial with an ‘unknown risk’ category) and HIV infection status (binary). The final model combines continuation-ratio with cumulative link logits for the risk perception, together with partly correlated and partly shared trivariate random effects for the household level. The results indicate that only age has a significant effect on the association between HIV risk perception and infection status. The proposed models may be useful in various fields of application such as social and biomedical sciences, epidemiology and public health. 相似文献

4.

Category Distinguishability and Observer Agreement 总被引：1，自引：0，他引：1

J. N. Darroch P. I. McCloud 《Australian & New Zealand Journal of Statistics》1986,28(3):371-388

It is common in the medical, biological, and social sciences for the categories into which an object is classified not to have a fully objective definition. Theoretically speaking the categories are therefore not completely distinguishable. The practical extent of their distinguishability can be measured when two expert observers classify the same sample of objects. It is shown, under reasonable assumptions, that the matrix of joint classification probabilities is quasi-symmetric, and that the symmetric matrix component is non-negative definite. The degree of distinguishability between two categories is defined and is used to give a measure of overall category distinguishability. It is argued that the kappa measure of observer agreement is unsatisfactory as a measure of overall category distinguishability. 相似文献

5.

Marginal inhomogeneity models for square contingency tables with nominal categories

Nobuko Miyamoto Kouji Tahata Hirokazu Ebie Sadao Tomizawa 《Journal of applied statistics》2006,33(2):203-215

For the analysis of square contingency tables with nominal categories, this paper proposes two kinds of models that indicate the structure of marginal inhomogeneity. One model states that the absolute values of log odds of the row marginal probability to the corresponding column marginal probability for each category i are constant for every i. The other model states that, on the condition that an observation falls in one of the off-diagonal cells in the square table, the absolute values of log odds of the conditional row marginal probability to the corresponding conditional column marginal probability for each category i are constant for every i. These models are used when the marginal homogeneity model does not hold, and the values of parameters in the models are useful for seeing the degree of departure from marginal homogeneity for the data on a nominal scale. Examples are given. 相似文献

6.

The treatment of substitution bias in consumer price index: An alternative approach

Ignazio Drudi 《Statistical Methods and Applications》2002,11(3):395-404

相似文献

7.

Estimation of non-parametric regression for dasometric measures

E. Ayuga T llez A.J. Martí n Fern ndez C. Gonz lez Garcí a E. Martí nez Falero 《Journal of applied statistics》2006,33(8):819-836

The aim of this paper is to describe a simulation procedure to compare parametric regression against a non-parametric regression method, for different functions and sets of information. The proposed methodology improves lack of fit at the edges of the regression curves, and an acceptable result is obtained for the no-parametric estimation in all studied cases. Larger differences appear at the edges of the estimation. The results are applied to the study of dasometric variables, which do not fulfil the normality hypothesis needed for parametric estimation. The kernel regression shows the relationship between the studied variables, which would not be detected with more rigid parametric models. 相似文献

8.

Moderate deviations for the random weighted sums of END random variables with consistently varying tails

Yinghua Dong Dingcheng Wang 《统计学通讯:理论与方法》2017,46(20):10116-10134

In this paper, we study moderate deviations for random weighted sums of extended negative dependent (END) random variables, which are consistently-varying tailed and not necessarily identically distributed. When these END random variables are independent of their weights, and the weights are positive random variables with two-sided bounds, the results shows END structure and the dependence between the weights have no effects on the asymptotic behavior of moderate deviations of partial sums and random sums. 相似文献

9.

基于潜类别随机前沿的区域创新效率及其影响因素分析

赖永剑《统计与信息论坛》2014,(10):52-57

运用可根据研究对象的潜在属性内生分组的潜类别随机前沿模型,采用1999-2012年中国各省区数据,研究各省区的创新效率及影响因素。结果表明:以人力资本水平和基础设施状况为条件变量,将全国各省区分成两个技术类别,分别有各自的技术前沿和函数形式,A类别中上海市的创新效率最高,B类别中河北省的创新效率最高;平均来看,各类的创新效率均呈上升趋势,贸易开放、产业结构和金融发展对创新效率均有显著的正向作用,同时创新效率在各类内部均存在俱乐部收敛。相似文献

10.

Andrew M. Jones Xander Koolman Nigel Rice 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(3):543-569

相似文献

11.

Measure of Asymmetry for Square Contingency Tables Having Ordered Categories

Sadao Tomizawa Nobuko Miyamoto & Yusuke Hatanaka 《Australian & New Zealand Journal of Statistics》2001,43(3):335-349

For the analysis of square contingency tables with nominal categories, Tomizawa and coworkers have considered measures that represent the degree of departure from symmetry. This paper proposes a measure that represents the degree of asymmetry for square contingency tables with ordered categories (instead of those with nominal categories). The measure proposed is expressed using the Cressie–Read power-divergence or Patil–Taillie diversity index, defined for the cumulative probabilities that an observation falls in row (column) category i or below and column (row) category j (> i ) or above. The measure depends on the order of listing the categories. It should be useful for comparing the degree of asymmetry in several tables with ordered categories. The relationship between the measure and the normal distribution is shown. 相似文献

12.

Cohen’s kappa is a weighted average 总被引：1，自引：0，他引：1

Matthijs J. Warrens 《Statistical Methodology》2011,8(6):473-484

相似文献

13.

On Modelling Agreement and Category Distinguishability on an Ordinal Scale

Lianyan Fu Man-Lai Tang Ning-Zhong Shi 《统计学通讯:理论与方法》2013,42(24):4413-4426

It is quite common that raters may need to classify a sample of subjects on a categorical scale. Perfect agreement can rarely be observed partly because of different perceptions about the meanings of the category labels between raters and partly because of factors such as intrarater variability. Usually, category indistinguishability occurs between adjacent categories. In this article, we propose a simple log-linear model combining ordinal scale information and category distinguishability between ordinal categories for modelling agreement between two raters. For the proposed model, no score assignment is required to the ordinal categories. An algorithm and statistical properties will be provided. 相似文献

14.

Multiple imputation methods for recurrent event data with missing event category

Douglas E. Schaubel Jianwen Cai 《Revue canadienne de statistique》2006,34(4):677-692

Frequently in clinical and epidemiologic studies, the event of interest is recurrent (i.e., can occur more than once per subject). When the events are not of the same type, an analysis which accounts for the fact that events fall into different categories will often be more informative. Often, however, although event times may always be known, information through which events are categorized may potentially be missing. Complete‐case methods (whose application may require, for example, that events be censored when their category cannot be determined) are valid only when event categories are missing completely at random. This assumption is rather restrictive. The authors propose two multiple imputation methods for analyzing multiple‐category recurrent event data under the proportional means/rates model. The use of a proper or improper imputation technique distinguishes the two approaches. Both methods lead to consistent estimation of regression parameters even when the missingness of event categories depends on covariates. The authors derive the asymptotic properties of the estimators and examine their behaviour in finite samples through simulation. They illustrate their approach using data from an international study on dialysis. 相似文献

15.

Measuring association between nominal categorical variables: an alternative to the Goodman–Kruskal lambda

Tarald O. Kvålseth 《Journal of applied statistics》2018,45(6):1118-1132

As a measure of association between two nominal categorical variables, the lambda coefficient or Goodman–Kruskal's lambda has become a most popular measure. Its popularity is primarily due to its simple and meaningful definition and interpretation in terms of the proportional reduction in error when predicting a random observation's category for one variable given (versus not knowing) its category for the other variable. It is an asymmetric measure, although a symmetric version is available. The lambda coefficient does, however, have a widely recognized limitation: it can equal zero even when there is no independence between the variables and when all other measures take on positive values. In order to mitigate this problem, an alternative lambda coefficient is introduced in this paper as a slight modification of the Goodman–Kruskal lambda. The properties of the new measure are discussed and a symmetric form is introduced. A statistical inference procedure is developed and a numerical example is provided. 相似文献

16.

Kernel partial correlation: a novel approach to capturing conditional independence in graphical models for noisy data

Jihwan Oh Faye Zheng R. W. Doerge 《Journal of applied statistics》2018,45(14):2677-2696

Graphical models capture the conditional independence structure among random variables via existence of edges among vertices. One way of inferring a graph is to identify zero partial correlation coefficients, which is an effective way of finding conditional independence under a multivariate Gaussian setting. For more general settings, we propose kernel partial correlation which extends partial correlation with a combination of two kernel methods. First, a nonparametric function estimation is employed to remove effects from other variables, and then the dependence between remaining random components is assessed through a nonparametric association measure. The proposed approach is not only flexible but also robust under high levels of noise owing to the robustness of the nonparametric approaches. 相似文献

17.

Ordinal ridge regression with categorical predictors

Faisal M. Zahid Shahla Ramzan 《Journal of applied statistics》2012,39(1):161-171

In multi-category response models, categories are often ordered. In the case of ordinal response models, the usual likelihood approach becomes unstable with ill-conditioned predictor space or when the number of parameters to be estimated is large relative to the sample size. The likelihood estimates do not exist when the number of observations is less than the number of parameters. The same problem arises if constraint on the order of intercept values is not met during the iterative procedure. Proportional odds models (POMs) are most commonly used for ordinal responses. In this paper, penalized likelihood with quadratic penalty is used to address these issues with a special focus on POMs. To avoid large differences between two parameter values corresponding to the consecutive categories of an ordinal predictor, the differences between the parameters of two adjacent categories should be penalized. The considered penalized-likelihood function penalizes the parameter estimates or differences between the parameter estimates according to the type of predictors. Mean-squared error for parameter estimates, deviance of fitted probabilities and prediction error for ridge regression are compared with usual likelihood estimates in a simulation study and an application. 相似文献

18.

Rounding non-binary categorical variables following multivariate normal imputation: evaluation of simple methods and implications for practice

《Journal of Statistical Computation and Simulation》2012,82(4):798-811

We study bias arising from rounding categorical variables following multivariate normal (MVN) imputation. This task has been well studied for binary variables, but not for more general categorical variables. Three methods that assign imputed values to categories based on fixed reference points are compared using 25 specific scenarios covering variables with k=3, …, 7 categories, and five distributional shapes, and for each k=3, …, 7, we examine the distribution of bias arising over 100,000 distributions drawn from a symmetric Dirichlet distribution. We observed, on both empirical and theoretical grounds, that one method (projected-distance-based rounding) is superior to the other two methods, and that the risk of invalid inference with the best method may be too high at sample sizes n≥150 at 50% missingness, n≥250 at 30% missingness and n≥1500 at 10% missingness. Therefore, these methods are generally unsatisfactory for rounding categorical variables (with up to seven categories) following MVN imputation. 相似文献

19.

Marginal asymmetry model for square contingency tables with ordered categories

Kouji Tahata Takuya Yoshimoto 《Journal of applied statistics》2015,42(2):371-379

For the analysis of square contingency tables with ordered categories, this paper proposes a model which indicates the structure of marginal asymmetry. The model states that the absolute values of logarithm of ratio of the cumulative probability that an observation will fall in row category i or below and column category i+1 or above to the corresponding cumulative probability that the observation falls in column category i or below and row category i+1 or above are constant for every i. We deal with the estimation problem for the model parameter and goodness-of-fit tests. Also we discuss the relationships between the model and a measure which represents the degree of departure from marginal homogeneity. Examples are given. 相似文献

20.

A stepwise algorithm for selecting category boundaries for the chi-squared goodness-of-fit test

Steve M. Bajgier Lalit K. Aggarwal 《统计学通讯:理论与方法》2013,42(7):2061-2081

A stepwise algorithm for selecting categories for the chisquared goodness-of-fit test with completely specified continuous null and alternative distributions is described in this paper. The procedure's starting point is an initial partitioning of the sample space into a large number of categories. A second partition with one fewer category is constructed by combining two categories of the original partition. The procedure continues until there are only two categories; the partition in the sequence with the highest estimated power is the one chosen. For illustartive purposes, the performance of the algorithm is evaluated for several hypothesis tests of the from H₀: normal distribution vs. H₁: a specific mixed normal distribution. For each test considered, the partition identified by the algorithm was compared to several equiprobable partitions, including the equiprobable partition with the highest estimated power. In all cases but one, the algorithm identified a parttion with higher estimated power than the best equiprobable partition. Applciations of the procedure are discussed. 相似文献