首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many survey questions allow respondents to pick any number out of c possible categorical responses or “items”. These kinds of survey questions often use the terminology “choose all that apply” or “pick any”. Often of interest is determining if the marginal response distributions of each item differ among r different groups of respondents. Agresti and Liu (1998, 1999) call this a test for multiple marginal independence (MMI). If respondents are allowed to pick only 1 out of c responses, the hypothesis test may be performed using the Pearson chi-square test of independence. However, since respondents may pick more or less than 1 response, the test's assumptions that responses are made independently of each other is violated. Recently, a few MMI testing methods have been proposed. Loughin and Scherer (1998) propose using a bootstrap method based on a modified version of the Pearson chi-square test statistic. Agresti and Liu (1998, 1999) propose using marginal logit models, quasisymmetric loglinear models, and a few methods based on Pearson chi-square test statistics. Decady and Thomas (1999) propose using a Rao-Scott adjusted chi-squared test statistic. There has not been a full investigation of these MMI testing methods. The purpose here is to evaluate the proposed methods and propose a few new methods. Recommendations are given to guide the practitioner in choosing which MMI testing methods to use.  相似文献   

2.
We study bias arising from rounding categorical variables following multivariate normal (MVN) imputation. This task has been well studied for binary variables, but not for more general categorical variables. Three methods that assign imputed values to categories based on fixed reference points are compared using 25 specific scenarios covering variables with k=3, …, 7 categories, and five distributional shapes, and for each k=3, …, 7, we examine the distribution of bias arising over 100,000 distributions drawn from a symmetric Dirichlet distribution. We observed, on both empirical and theoretical grounds, that one method (projected-distance-based rounding) is superior to the other two methods, and that the risk of invalid inference with the best method may be too high at sample sizes n≥150 at 50% missingness, n≥250 at 30% missingness and n≥1500 at 10% missingness. Therefore, these methods are generally unsatisfactory for rounding categorical variables (with up to seven categories) following MVN imputation.  相似文献   

3.

Influence diagnostics are investigated in this study. In particular, an approach based on the generalized linear mixed model setting is presented for formulating ordered categorical counts in stratified contingency tables. Deletion diagnostics and their first-order approximations are developed for assessing the stratum-specific influence on parameter estimates in the models. To illustrate the proposed model diagnostic technique, the method is applied to analyze two sets of data: a clinical trial and a survey study. The two examples demonstrate that the presence of influential strata may substantially change the results in ordinal contingency table analysis.  相似文献   

4.
In this paper, procedures for all pairwise comparisons of location parameters of negative exponential populations are developed when the common scale parameter is known or unknown using large sample distributional approximations of the relevant random variables. The small sample performance of these procedures are then examined using Monte Carlo simulation.  相似文献   

5.
It is well known that the Pearson statistic \(\chi ^{2}\) can perform poorly in studying the association between ordinal categorical variables. Taguchi’s and Hirotsu’s statistics have been introduced in the literature as simple alternatives to Pearson’s chi-squared test for contingency tables with ordered categorical variables. The aim of this paper is to shed new light on these statistics, stressing their interpretations and characteristics, providing in this way new and different interpretations of these statistics. Moreover, a theoretical scheme is developed showing the links between the different proposals and classes of cumulative chi-squared statistical tests, starting from a unifying index of heterogeneity, unalikeability and variability measures. Users of statistics may find it attractive to understand well the different proposals. Some decompositions of both statistics are also highlighted. This paper presents a case study of optimizing the polysilicon deposition process in a very large-scale integrated circuit, to identify the optimal combination of factor levels. It is obtained by means of the information coming from a correspondence analysis based on Taguchi’s statistic and regression models for binary dependent variables. A new optimal combination of factor levels is obtained, different from many others proposed in the literature for this data.  相似文献   

6.
In contrast to the common belief that the logit model has no analytical presentation, it is possible to find such a solution in the case of categorical predictors. This paper shows that a binary logistic regression by categorical explanatory variables can be constructed in a closed-form solution. No special software and no iterative procedures of nonlinear estimation are needed to obtain a model with all its parameters and characteristics, including coefficients of regression, their standard errors and t-statistics, as well as the residual and null deviances. The derivation is performed for logistic models with one binary or categorical predictor, and several binary or categorical predictors. The analytical formulae can be used for arithmetical calculation of all the parameters of the logit regression. The explicit expressions for the characteristics of logit regression are convenient for the analysis and interpretation of the results of logistic modeling.  相似文献   

7.
A general framework is proposed for modelling clustered mixed outcomes. A mixture of generalized linear models is used to describe the joint distribution of a set of underlying variables, and an arbitrary function relates the underlying variables to be observed outcomes. The model accommodates multilevel data structures, general covariate effects and distinct link functions and error distributions for each underlying variable. Within the framework proposed, novel models are developed for clustered multiple binary, unordered categorical and joint discrete and continuous outcomes. A Markov chain Monte Carlo sampling algorithm is described for estimating the posterior distributions of the parameters and latent variables. Because of the flexibility of the modelling framework and estimation procedure, extensions to ordered categorical outcomes and more complex data structures are straightforward. The methods are illustrated by using data from a reproductive toxicity study.  相似文献   

8.
Multinomial logit (also termed multi-logit) models permit the analysis of the statistical relation between a categorical response variable and a set of explicative variables (called covariates or regressors). Although multinomial logit is widely used in both the social and economic sciences, the interpretation of regression coefficients may be tricky, as the effect of covariates on the probability distribution of the response variable is nonconstant and difficult to quantify. The ternary plots illustrated in this article aim at facilitating the interpretation of regression coefficients and permit the effect of covariates (either singularly or jointly considered) on the probability distribution of the dependent variable to be quantified. Ternary plots can be drawn both for ordered and for unordered categorical dependent variables, when the number of possible outcomes equals three (trinomial response variable); these plots allow not only to represent the covariate effects over the whole parameter space of the dependent variable but also to compare the covariate effects of any given individual profile. The method is illustrated and discussed through analysis of a dataset concerning the transition of master’s graduates of the University of Trento (Italy) from university to employment.  相似文献   

9.
Mixture separation for mixed-mode data   总被引:3,自引:0,他引:3  
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.  相似文献   

10.
A general class of multivariate regression models is considered for repeated measurements with discrete and continuous outcome variables. The proposed model is based on the seemingly unrelated regression model (Zellner, 1962) and an extension of the model of Park and Woolson(1992). The regression parameters of the model are consistently estimated using the two-stage least squares method. When the out come variables are multivariate normal, the two-stage estimator reduces to Zellner’s two-stage estimator. As a special case, we consider the marginal distribution described by Liang and Zeger (1986). Under this this distributional assumption, we show that the two-stage estimator has similar asymptotic properties and comparable small sample properties to Liang and Zeger's estimator. Since the proposed approach is based on the least squares method, however, any distributional assumption is not required for variables outcome variables. As a result, the proposed estimator is more robust to the marginal distribution of outcomes.  相似文献   

11.
Missing observations due to non‐response are commonly encountered in data collected from sample surveys. The focus of this article is on item non‐response which is often handled by filling in (or imputing) missing values using the observed responses (donors). Random imputation (single or fractional) is used within homogeneous imputation classes that are formed on the basis of categorical auxiliary variables observed on all the sampled units. A uniform response rate within classes is assumed, but that rate is allowed to vary across classes. We construct confidence intervals (CIs) for a population parameter that is defined as the solution to a smooth estimating equation with data collected using stratified simple random sampling. The imputation classes are assumed to be formed across strata. Fractional imputation with a fixed number of random draws is used to obtain an imputed estimating function. An empirical likelihood inference method under the fractional imputation is proposed and its asymptotic properties are derived. Two asymptotically correct bootstrap methods are developed for constructing the desired CIs. In a simulation study, the proposed bootstrap methods are shown to outperform traditional bootstrap methods and some non‐bootstrap competitors under various simulation settings. The Canadian Journal of Statistics 47: 281–301; 2019 © 2019 Statistical Society of Canada  相似文献   

12.
In data sets with many predictors, algorithms for identifying a good subset of predictors are often used. Most such algorithms do not allow for any relationships between predictors. For example, stepwise regression might select a model containing an interaction AB but neither main effect A or B. This paper develops mathematical representations of this and other relations between predictors, which may then be incorporated in a model selection procedure. A Bayesian approach that goes beyond the standard independence prior for variable selection is adopted, and preference for certain models is interpreted as prior information. Priors relevant to arbitrary interactions and polynomials, dummy variables for categorical factors, competing predictors, and restrictions on the size of the models are developed. Since the relations developed are for priors, they may be incorporated in any Bayesian variable selection algorithm for any type of linear model. The application of the methods here is illustrated via the stochastic search variable selection algorithm of George and McCulloch (1993), which is modified to utilize the new priors. The performance of the approach is illustrated with two constructed examples and a computer performance dataset.  相似文献   

13.
For a multivariate linear model, Wilk's likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative hypothesis requires complex analytic approximations, and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say p≤20. On the other hand, assuming that the data dimension p as well as the number q of regression variables are fixed while the sample size n grows, several asymptotic approximations are proposed in the literature for Wilk's Λ including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilk's test in a high-dimensional context, specifically assuming a high data dimension p and a large sample size n. Based on recent random matrix theory, the correction we propose to Wilk's test is asymptotically Gaussian under the null hypothesis and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large p and large n context, but also for moderately large data dimensions such as p=30 or p=50. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in multivariate analysis of variance which is valid for high-dimensional data.  相似文献   

14.
For testing the adequacy of a parametric model in regression, various test statistics can be constructed on the basis of a marked empirical process of residuals. By using a discretized version of the decomposition of the corresponding Gaussian limiting process into its principal components, we obtain a test statistic with an asymptotic chi-squared distribution under the null hypothesis. We investigate the consistency of this test statistic and of the estimators needed to compute it. Numerical experiments indicate that the distributional approximations already work for small to moderate sample sizes and reveal that the test has good power properties against a variety of alternatives. The test has a simple implementation. We present an application to a real-data example for testing the adequacy of a possible heteroscedastic exponential model.  相似文献   

15.
S. Bedbur  U. Kamps 《Statistics》2017,51(5):1132-1142
As a submodel of generalized order statistics with two unknown model parameters, m-generalized order statistics may serve as a simple model for ordered quantities in a given application. It is shown that the joint distribution of m-generalized order statistics has a representation as a regular exponential family in the model parameters, as it is the case for the comprising model. Utilizing this finding, a minimal sufficient and complete statistic is obtained along with distributional properties. Joint maximum likelihood estimation of the parameters is considered, and strong consistency and asymptotic efficiency of the estimator are established. A test is provided to decide whether a restriction to the submodel is reasonable.  相似文献   

16.
Motivated by an application in Electrical Engineering, we derive the exact distribution of the sum of the largest n?k out of n normally distributed random variables, with differing mean values. Comparisons are made with two normal approximations to this distribution—one arising from the asymptotic negligibility of the omitted order statistics and one from the theory of L-statistics. The latter approximation is found to be in excellent agreement with the exact distribution.  相似文献   

17.
This paper is an overview of a unified framework for analyzing designed experiments with univariate or multivariate responses. Both categorical and continuous design variables are considered. To handle unbalanced data, we introduce the so-called Type II* sums of squares. This means that the results are independent of the scale chosen for continuous design variables. Furthermore, it does not matter whether two-level variables are coded as categorical or continuous. Overall testing of all responses is done by 50-50 MANOVA, which handles several highly correlated responses. Univariate p-values for each response are adjusted by using rotation testing. To illustrate multivariate effects, mean values and mean predictions are illustrated in a principal component score plot or directly as curves. For the unbalanced cases, we introduce a new variant of adjusted means, which are independent to the coding of two-level variables. The methodology is exemplified by case studies from cheese and fish pudding production.  相似文献   

18.
In this paper, we derive some recurrence relations satisfied by the single and the product moments of order statistics arising from n independent and non-identically distributed power function random variables. These recurrence relations will enable one to compute all the single and the product moments of all order statistics in a simple recursive manner. The results for the multiple-outlier model are deduced as special cases. The results are further generalized to the case of truncated power function random variables.  相似文献   

19.
Using a multivariate latent variable approach, this article proposes some new general models to analyze the correlated bounded continuous and categorical (nominal or/and ordinal) responses with and without non-ignorable missing values. First, we discuss regression methods for jointly analyzing continuous, nominal, and ordinal responses that we motivated by analyzing data from studies of toxicity development. Second, using the beta and Dirichlet distributions, we extend the models so that some bounded continuous responses are replaced for continuous responses. The joint distribution of the bounded continuous, nominal and ordinal variables is decomposed into a marginal multinomial distribution for the nominal variable and a conditional multivariate joint distribution for the bounded continuous and ordinal variables given the nominal variable. We estimate the regression parameters under the new general location models using the maximum-likelihood method. Sensitivity analysis is also performed to study the influence of small perturbations of the parameters of the missing mechanisms of the model on the maximal normal curvature. The proposed models are applied to two data sets: BMI, Steatosis and Osteoporosis data and Tehran household expenditure budgets.  相似文献   

20.
Using the concept of near-exact approximation to a distribution we developed two different near-exact approximations to the distribution of the product of an odd number of particular independent Beta random variables (r.v.'s). One of them is a particular generalized near-integer Gamma (GNIG) distribution and the other is a mixture of two GNIG distributions. These near-exact distributions are mostly adequate to be used as a basis for approximations of distributions of several statistics used in multivariate analysis. By factoring the characteristic function (c.f.) of the logarithm of the product of the Beta r.v.'s, and then replacing a suitably chosen factor of that c.f. by an adequate asymptotic result it is possible to obtain what we call a near-exact c.f., which gives rise to the near-exact approximation to the exact distribution. Depending on the asymptotic result used to replace the chosen parts of the c.f., one may obtain different near-exact approximations. Moments from the two near-exact approximations developed are compared with the exact ones. The two approximations are also compared with each other, namely in terms of moments and quantiles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号