首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The analysis of incomplete contingency tables is a practical and an interesting problem. In this paper, we provide characterizations for the various missing mechanisms of a variable in terms of response and non-response odds for two and three dimensional incomplete tables. Log-linear parametrization and some distinctive properties of the missing data models for the above tables are discussed. All possible cases in which data on one, two or all variables may be missing are considered. We study the missingness of each variable in a model, which is more insightful for analyzing cross-classified data than the missingness of the outcome vector. For sensitivity analysis of the incomplete tables, we propose easily verifiable procedures to evaluate the missing at random (MAR), missing completely at random (MCAR) and not missing at random (NMAR) assumptions of the missing data models. These methods depend only on joint and marginal odds computed from fully and partially observed counts in the tables, respectively. Finally, some real-life datasets are analyzed to illustrate our results, which are confirmed based on simulation studies.  相似文献   

2.
Latent class analysis (LCA) has been found to have important applications in social and behavioral sciences for modeling categorical response variables, and nonresponse is typical when collecting data. In this study, the nonresponse mainly included “contingency questions” and real “missing data.” The primary objective of this research was to evaluate the effects of some potential factors on model selection indices in LCA with nonresponse data.

We simulated missing data with contingency questions and evaluated the accuracy rates of eight information criteria for selecting the correct models. The results showed that the main factors are latent class proportions, conditional probabilities, sample size, the number of items, the missing data rate, and the contingency data rate. Interactions of the conditional probabilities with class proportions, sample size, and the number of items are also significant. From our simulation results, the impact of missing data and contingency questions can be amended by increasing the sample size or the number of items.  相似文献   


3.
Latent class analysis (LCA) has been found to have important applications in social and behavioural sciences for modelling categorical response variables, and non-response is typical when collecting data. In this study, the non-response mainly included ‘contingency questions’ and real ‘missing data’. The primary objective of this study was to evaluate the effects of some potential factors on model selection indices in LCA with non-response data. We simulated missing data with contingency question and evaluated the accuracy rates of eight information criteria for selecting the correct models. The results showed that the main factors are latent class proportions, conditional probabilities, sample size, the number of items, the missing data rate and the contingency data rate. Interactions of the conditional probabilities with class proportions, sample size and the number of items are also significant. From our simulation results, the impact of missing data and contingency questions can be amended by increasing the sample size or the number of items.  相似文献   

4.
A two-way contingency table in which both variables have the same categories is termed a symmetric table. In many applications, because of the social processes involved, most of the observations lie on the main diagonal and the off-diagonal counts are small. For these tables, the model of independence is implausible and interest is then focussed on the off-diagonal cells and the models of quasi-independence and quasi-symmetry. For ordinal variables, a linear-by-linear association model can be used to model the interaction structure. For sparse tables, large-sample goodness-of-fit tests are often unreliable and one should use an exact test. In this paper, we review exact tests and the computing problems involved. We propose new recursive algorithms for exact goodness-of-fit tests of quasi-independence, quasi-symmetry, linear-by-linear association and some related models. We propose that all computations be carried out using symbolic computation and rational arithmetic in order to calculate the exact p-values accurately and describe how we implemented our proposals. Two examples are presented.  相似文献   

5.
This paper is concerned wim ine maximum likelihood estimation and the likelihood ratio test for hierarchical loglinear models of multidimensional contingency tables with missing data. The problems of estimation and test for a high dimensional contingency table can be reduced into those for a class of low dimensional tables. In some cases, the incomplete data in the high dimensional table can become complete in the low dimensional tables through the reduction can indicate how much the incomplete data contribute to the estimation and the test.  相似文献   

6.
In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology.  相似文献   

7.
Bayesian models for relative archaeological chronology building   总被引:1,自引:0,他引:1  
For many years, archaeologists have postulated that the numbers of various artefact types found within excavated features should give insight about their relative dates of deposition even when stratigraphic information is not present. A typical data set used in such studies can be reported as a cross-classification table (often called an abundance matrix or, equivalently, a contingency table) of excavated features against artefact types. Each entry of the table represents the number of a particular artefact type found in a particular archaeological feature. Methodologies for attempting to identify temporal sequences on the basis of such data are commonly referred to as seriation techniques. Several different procedures for seriation including both parametric and non-parametric statistics have been used in an attempt to reconstruct relative chronological orders on the basis of such contingency tables. We develop some possible model-based approaches that might be used to aid in relative, archaeological chronology building. We use the recently developed Markov chain Monte Carlo method based on Langevin diffusions to fit some of the models proposed. Predictive Bayesian model choice techniques are then employed to ascertain which of the models that we develop are most plausible. We analyse two data sets taken from the literature on archaeological seriation.  相似文献   

8.
For square contingency tables with ordered categories, there may be some cases that one wants to analyze them by considering collapsed tables with some adjacent categories combined in the original table. This paper proposes three kinds of new models which have the structure of point-symmetry (PS), quasi point-symmetry and marginal point-symmetry for collapsed square tables. This paper also gives a decomposition of the PS model for collapsed square tables. The father's and his daughter's occupational mobility data are analyzed using new models.  相似文献   

9.
The marginal totals of a contingency table can be rearranged to form a new table. If at least twoof these totals include the same cell of the original table, the new table cannot be treated as anordinary contingency table. An iterative method is proposed to calculate maximum likelihood estimators for the expected cell frequencies of the original table under the assumption that some marginal totals (or more generally, some linear functions) of these expected frequencies satisfy a log-linear model.In some cases, a table of correlated marginal totals is treated as if it was an ordinary contingency table. The effects of ignoring the special structure of the marginal table on thedistributionof the goodness-of-fit test statistics are discussed and illustrated, with special reference to stationary Markov chains.  相似文献   

10.
This paper studies the mu1tinomial model 2x2 contingency table data with some cell counts missing .Various hypotheses of interest including row-column independence are tested by using Bayes factors which represent the ratio of the posterior odds to the prior odds for the null hypothesis. The Dirichlet-Beta family of prior distributions is considered for the multinomial parameters cond itional on the complement of the null hypothesis. The Bayes factor for the incomplete data is a mixture of the Bayes factors for different possibilities for the full data.  相似文献   

11.
ABSTRACT

This paper analyses the behaviour of the goodness-of-fit tests for regression models. To this end, it uses statistics based on an estimation of the integrated regression function with missing observations either in the response variable or in some of the covariates. It proposes several versions of one empirical process, constructed from a previous estimation, that uses only the complete observations or replaces the missing observations with imputed values. In the case of missing covariates, a link model is used to fill the missing observations with other complete covariates. In all the situations, Bootstrap methodology is used to calibrate the distribution of the test statistics. A broad simulation study compares the different procedures based on empirical regression methodology, with smoothed tests previously studied in the literature. The comparison reflects the effect of the correlation between the covariates in the tests based on the imputed sample for missing covariates. In addition, the paper proposes a computational binning strategy to evaluate the tests based on an empirical process for large data sets. Finally, two applications to real data illustrate the performance of the tests.  相似文献   

12.
In clinical practice, the profile of each subject's CD4 response from a longitudinal study may follow a ‘broken stick’ like trajectory, indicating multiple phases of increase and/or decline in response. Such multiple phases (changepoints) may be important indicators to help quantify treatment effect and improve management of patient care. Although it is a common practice to analyze complex AIDS longitudinal data using nonlinear mixed-effects (NLME) or nonparametric mixed-effects (NPME) models in the literature, NLME or NPME models become a challenge to estimate changepoint due to complicated structures of model formulations. In this paper, we propose a changepoint mixed-effects model with random subject-specific parameters, including the changepoint for the analysis of longitudinal CD4 cell counts for HIV infected subjects following highly active antiretroviral treatment. The longitudinal CD4 data in this study may exhibit departures from symmetry, may encounter missing observations due to various reasons, which are likely to be non-ignorable in the sense that missingness may be related to the missing values, and may be censored at the time of the subject going off study-treatment, which is a potentially informative dropout mechanism. Inferential procedures can be complicated dramatically when longitudinal CD4 data with asymmetry (skewness), incompleteness and informative dropout are observed in conjunction with an unknown changepoint. Our objective is to address the simultaneous impact of skewness, missingness and informative censoring by jointly modeling the CD4 response and dropout time processes under a Bayesian framework. The method is illustrated using a real AIDS data set to compare potential models with various scenarios, and some interested results are presented.  相似文献   

13.
Strict collapsibility and model collapsibility are two important concepts associated with the dimension reduction of a multidimensional contingency table, without losing the relevant information. In this paper, we obtain some necessary and sufficient conditions for the strict collapsibility of the full model, with respect to an interaction factor or a set of interaction factors, based on the interaction parameters of the conditional/layer log-linear models. For hierarchical log-linear models, we present also necessary and sufficient conditions for the full model to be model collapsible, based on the conditional interaction parameters. We discuss both the cases where one variable or a set of variables is conditioned. The connections between the strict collapsibility and the model collapsibility are also pointed out. Our results are illustrated through suitable examples, including a real life application.  相似文献   

14.
Categorical data frequently arise in applications in the Social Sciences. In such applications, the class of log-linear models, based on either a Poisson or (product) multinomial response distribution, is a flexible model class for inference and prediction. In this paper we consider the Bayesian analysis of both Poisson and multinomial log-linear models. It is often convenient to model multinomial or product multinomial data as observations of independent Poisson variables. For multinomial data, Lindley (1964) [20] showed that this approach leads to valid Bayesian posterior inferences when the prior density for the Poisson cell means factorises in a particular way. We develop this result to provide a general framework for the analysis of multinomial or product multinomial data using a Poisson log-linear model. Valid finite population inferences are also available, which can be particularly important in modelling social data. We then focus particular attention on multivariate normal prior distributions for the log-linear model parameters. Here, an improper prior distribution for certain Poisson model parameters is required for valid multinomial analysis, and we derive conditions under which the resulting posterior distribution is proper. We also consider the construction of prior distributions across models, and for model parameters, when uncertainty exists about the appropriate form of the model. We present classes of Poisson and multinomial models, invariant under certain natural groups of permutations of the cells. We demonstrate that, if prior belief concerning the model parameters is also invariant, as is the case in a ‘reference’ analysis, then the choice of prior distribution is considerably restricted. The analysis of multivariate categorical data in the form of a contingency table is considered in detail. We illustrate the methods with two examples.  相似文献   

15.
We review some issues related to the implications of different missing data mechanisms on statistical inference for contingency tables and consider simulation studies to compare the results obtained under such models to those where the units with missing data are disregarded. We confirm that although, in general, analyses under the correct missing at random and missing completely at random models are more efficient even for small sample sizes, there are exceptions where they may not improve the results obtained by ignoring the partially classified data. We show that under the missing not at random (MNAR) model, estimates on the boundary of the parameter space as well as lack of identifiability of the parameters of saturated models may be associated with undesirable asymptotic properties of maximum likelihood estimators and likelihood ratio tests; even in standard cases the bias of the estimators may be low only for very large samples. We also show that the probability of a boundary solution obtained under the correct MNAR model may be large even for large samples and that, consequently, we may not always conclude that a MNAR model is misspecified because the estimate is on the boundary of the parameter space.  相似文献   

16.
This paper presents a matrix formulation for log-linear model analysis of the incomplete contingency table which arises from multiple recapture census data. Explicit matrix product expressions are given for the asymptotic covariance structure of the maximum likelihood estimators of both the log-linear model parameter vector and the predicted value vector for the observed and missing cells. These results are illustrated for data pertaining to a population of children possessing a common congenital anomaly.  相似文献   

17.
In the analysis of non-monotone missing data patterns in multinomial distributions for contingency tables, it is known that explicit MLEs of the unknown parameters cannot be obtained. Iterative procedures such as the EM-algorithm are therefore required to obtain the MLEs. These iterative procedures, however, may offer several potential difficulties. Andersson and Perlman [Ann. Statist. 21 (1993) 1318–1358] introduced lattice conditional independence (LCI) models for multivariate normal distributions, which can be applied to the analysis of non-monotone missing observations in continuous data (Andersson and Perlman, Statist. Probab. Lett. 12 (1991) 465–486). In this paper, we show that LCI models may also be applied to the analysis of categorical data with non-monotone missing data patterns. Under a parsimonious set of LCI assumptions naturally determined by the observed data pattern, the likelihood function for the observed data can be factored as in the monotone case and explicit MLEs can be obtained for the unknown parameters. Furthermore, the LCI assumptions can be tested by explicit likelihood ratio tests.  相似文献   

18.
The implementation of the Bayesian paradigm to model comparison can be problematic. In particular, prior distributions on the parameter space of each candidate model require special care. While it is well known that improper priors cannot be routinely used for Bayesian model comparison, we claim that also the use of proper conventional priors under each model should be regarded as suspicious, especially when comparing models having different dimensions. The basic idea is that priors should not be assigned separately under each model; rather they should be related across models, in order to acquire some degree of compatibility, and thus allow fairer and more robust comparisons. In this connection, the intrinsic prior as well as the expected posterior prior (EPP) methodology represent a useful tool. In this paper we develop a procedure based on EPP to perform Bayesian model comparison for discrete undirected decomposable graphical models, although our method could be adapted to deal also with directed acyclic graph models. We present two possible approaches. One based on imaginary data, and one which makes use of a limited number of actual data. The methodology is illustrated through the analysis of a 2×3×4 contingency table.  相似文献   

19.
Pharmacokinetic (PK) data often contain concentration measurements below the quantification limit (BQL). While specific values cannot be assigned to these observations, nevertheless these observed BQL data are informative and generally known to be lower than the lower limit of quantification (LLQ). Setting BQLs as missing data violates the usual missing at random (MAR) assumption applied to the statistical methods, and therefore leads to biased or less precise parameter estimation. By definition, these data lie within the interval [0, LLQ], and can be considered as censored observations. Statistical methods that handle censored data, such as maximum likelihood and Bayesian methods, are thus useful in the modelling of such data sets. The main aim of this work was to investigate the impact of the amount of BQL observations on the bias and precision of parameter estimates in population PK models (non‐linear mixed effects models in general) under maximum likelihood method as implemented in SAS and NONMEM, and a Bayesian approach using Markov chain Monte Carlo (MCMC) as applied in WinBUGS. A second aim was to compare these different methods in dealing with BQL or censored data in a practical situation. The evaluation was illustrated by simulation based on a simple PK model, where a number of data sets were simulated from a one‐compartment first‐order elimination PK model. Several quantification limits were applied to each of the simulated data to generate data sets with certain amounts of BQL data. The average percentage of BQL ranged from 25% to 75%. Their influence on the bias and precision of all population PK model parameters such as clearance and volume distribution under each estimation approach was explored and compared. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

20.
ABSTRACT

This paper extends the classical methods of analysis of a two-way contingency table to the fuzzy environment for two cases: (1) when the available sample of observations is reported as imprecise data, and (2) the case in which we prefer to categorize the variables based on linguistic terms rather than as crisp quantities. For this purpose, the α-cuts approach is used to extend the usual concepts of the test statistic and p-value to the fuzzy test statistic and fuzzy p-value. In addition, some measures of association are extended to the fuzzy version in order to evaluate the dependence in such contingency tables. Some practical examples are provided to explain the applicability of the proposed methods in real-world problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号