期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The impact of dichotomization in longitudinal data analysis: a simulation study

Bongin Yoo 《Pharmaceutical statistics》2010,9(4):298-312

In this paper, a simulation study is conducted to systematically investigate the impact of dichotomizing longitudinal continuous outcome variables under various types of missing data mechanisms. Generalized linear models (GLM) with standard generalized estimating equations (GEE) are widely used for longitudinal outcome analysis, but these semi‐parametric approaches are only valid under missing data completely at random (MCAR). Alternatively, weighted GEE (WGEE) and multiple imputation GEE (MI‐GEE) were developed to ensure validity under missing at random (MAR). Using a simulation study, the performance of standard GEE, WGEE and MI‐GEE on incomplete longitudinal dichotomized outcome analysis is evaluated. For comparisons, likelihood‐based linear mixed effects models (LMM) are used for incomplete longitudinal original continuous outcome analysis. Focusing on dichotomized outcome analysis, MI‐GEE with original continuous missing data imputation procedure provides well controlled test sizes and more stable power estimates compared with any other GEE‐based approaches. It is also shown that dichotomizing longitudinal continuous outcome will result in substantial loss of power compared with LMM. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

2.

A pseudo likelihood approach to analyze rate difference of binary response data in longitudinal factorial studies

Chunpeng Fan 《统计学通讯:模拟与计算》2018,47(7):2169-2183

Although Fan showed that the mixed-effects model for repeated measures (MMRM) is appropriate to analyze complete longitudinal binary data in terms of the rate difference, they focused on using the generalized estimating equations (GEE) to make statistical inference. The current article emphasizes validity of the MMRM when the normal-distribution-based pseudo likelihood approach is used to make inference for complete longitudinal binary data. For incomplete longitudinal binary data with missing at random missing mechanism, however, the MMRM, using either the GEE or the normal-distribution-based pseudo likelihood inferential procedure, gives biased results in general and should not be used for analysis. 相似文献

3.

Impact of missing data on type 1 error rates in non‐inferiority trials

Bongin Yoo 《Pharmaceutical statistics》2010,9(2):87-99

In this paper, a simulation study is conducted to systematically investigate the impact of different types of missing data on six different statistical analyses: four different likelihood‐based linear mixed effects models and analysis of covariance (ANCOVA) using two different data sets, in non‐inferiority trial settings for the analysis of longitudinal continuous data. ANCOVA is valid when the missing data are completely at random. Likelihood‐based linear mixed effects model approaches are valid when the missing data are at random. Pattern‐mixture model (PMM) was developed to incorporate non‐random missing mechanism. Our simulations suggest that two linear mixed effects models using unstructured covariance matrix for within‐subject correlation with no random effects or first‐order autoregressive covariance matrix for within‐subject correlation with random coefficient effects provide well control of type 1 error (T1E) rate when the missing data are completely at random or at random. ANCOVA using last observation carried forward imputed data set is the worst method in terms of bias and T1E rate. PMM does not show much improvement on controlling T1E rate compared with other linear mixed effects models when the missing data are not at random but is markedly inferior when the missing data are at random. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

4.

The two-sample problem with multivariate censored data: a new rank test family

Eve Leconte Thierry Moreau Joseph Lellouch 《统计学通讯:模拟与计算》2013,42(4):1061-1076

A new rank test family is proposed to test the equality of two multivariate failure times distributions with censored observations. The tests are very simple: they are based on a transformation of the multivariate rank vectors to a univariate rank score and the resulting statistics belong to the familiar class of the weighted logrank test statistics. The new procedure is also applicable to multivariate observations in general, such as repeated measures, some of which may be missing. To investigate the performance of the proposed tests, a simulation study was conducted with bivariate exponential models for various censoring rates. The size and power of these tests against Lehmann alternatives were compared to the size and power of two other tests (Wei and Lachin, 1984 and Wei and Knuiman, 1987). In all simulations the new procedures provide a relatively good power and an accurate control over the size of the test. A real example from the National Cooperative Gallstone Study is given 相似文献

5.

Methods for repeated measures data analysis with missing values

《Journal of statistical planning and inference》1999,77(2):221-236

There are various techniques for dealing with incomplete data; some are computationally highly intensive and others are not as computationally intensive, while all may be comparable in their efficiencies. In spite of these developments, analysis using only the complete data subset is performed when using popular statistical software. In an attempt to demonstrate the efficiencies and advantages of using all available data, we compared several approaches that are relatively simple but efficient alternatives to those using the complete data subset for analyzing repeated measures data with missing values, under the assumption of a multivariate normal distribution of the data. We also assumed that the missing values occur in a monotonic pattern and completely at random. The incomplete data procedure is demonstrated to be more powerful than the procedure of using the complete data subset, generally when the within-subject correlation gets large. One other principal finding is that even with small sample data, for which various covariance models may be indistinguishable, the empirical size and power are shown to be sensitive to misspecified assumptions about the covariance structure. Overall, the testing procedures that do not assume any particular covariance structure are shown to be more robust in keeping the empirical size at the nominal level than those assuming a special structure. 相似文献

6.

Comparison of alternative imputation methods for ordinal data

Federica Cugnata 《统计学通讯:模拟与计算》2017,46(1):315-330

In this article, we compare alternative missing imputation methods in the presence of ordinal data, in the framework of CUB (Combination of Uniform and (shifted) Binomial random variable) models. Various imputation methods are considered, as are univariate and multivariate approaches. The first step consists of running a simulation study designed by varying the parameters of the CUB model, to consider and compare CUB models as well as other methods of missing imputation. We use real datasets on which to base the comparison between our approach and some general methods of missing imputation for various missing data mechanisms. 相似文献

7.

Missing data techniques for multilevel data: implications of model misspecification

Anne C. Black Ofer Harel D. Betsy McCoach 《Journal of applied statistics》2011,38(9):1845-1865

When modeling multilevel data, it is important to accurately represent the interdependence of observations within clusters. Ignoring data clustering may result in parameter misestimation. However, it is not well established to what degree parameter estimates are affected by model misspecification when applying missing data techniques (MDTs) to incomplete multilevel data. We compare the performance of three MDTs with incomplete hierarchical data. We consider the impact of imputation model misspecification on the quality of parameter estimates by employing multiple imputation under assumptions of a normal model (MI/NM) with two-level cross-sectional data when values are missing at random on the dependent variable at rates of 10%, 30%, and 50%. Five criteria are used to compare estimates from MI/NM to estimates from MI assuming a linear mixed model (MI/LMM) and maximum likelihood estimation to the same incomplete data sets. With 10% missing data (MD), techniques performed similarly for fixed-effects estimates, but variance components were biased with MI/NM. Effects of model misspecification worsened at higher rates of MD, with the hierarchical structure of the data markedly underrepresented by biased variance component estimates. MI/LMM and maximum likelihood provided generally accurate and unbiased parameter estimates but performance was negatively affected by increased rates of MD. 相似文献

8.

A general joint model for longitudinal measurements and competing risks survival data with heterogeneous random effects

Xin Huang Gang Li Robert M. Elashoff Jianxin Pan 《Lifetime data analysis》2011,17(1):80-100

This article studies a general joint model for longitudinal measurements and competing risks survival data. The model consists of a linear mixed effects sub-model for the longitudinal outcome, a proportional cause-specific hazards frailty sub-model for the competing risks survival data, and a regression sub-model for the variance–covariance matrix of the multivariate latent random effects based on a modified Cholesky decomposition. The model provides a useful approach to adjust for non-ignorable missing data due to dropout for the longitudinal outcome, enables analysis of the survival outcome with informative censoring and intermittently measured time-dependent covariates, as well as joint analysis of the longitudinal and survival outcomes. Unlike previously studied joint models, our model allows for heterogeneous random covariance matrices. It also offers a framework to assess the homogeneous covariance assumption of existing joint models. A Bayesian MCMC procedure is developed for parameter estimation and inference. Its performances and frequentist properties are investigated using simulations. A real data example is used to illustrate the usefulness of the approach. 相似文献

9.

Evaluation of missing data mechanisms in two and three dimensional incomplete tables

Sayan Ghosh Palaniappan Vellaisamy 《Journal of the Korean Statistical Society》2019,48(2):297-313

The analysis of incomplete contingency tables is a practical and an interesting problem. In this paper, we provide characterizations for the various missing mechanisms of a variable in terms of response and non-response odds for two and three dimensional incomplete tables. Log-linear parametrization and some distinctive properties of the missing data models for the above tables are discussed. All possible cases in which data on one, two or all variables may be missing are considered. We study the missingness of each variable in a model, which is more insightful for analyzing cross-classified data than the missingness of the outcome vector. For sensitivity analysis of the incomplete tables, we propose easily verifiable procedures to evaluate the missing at random (MAR), missing completely at random (MCAR) and not missing at random (NMAR) assumptions of the missing data models. These methods depend only on joint and marginal odds computed from fully and partially observed counts in the tables, respectively. Finally, some real-life datasets are analyzed to illustrate our results, which are confirmed based on simulation studies. 相似文献

10.

The Muscatine children's obesity data reanalysed using pattern mixture models

Anders Ekholm & Chris Skinner 《Journal of the Royal Statistical Society. Series C, Applied statistics》1998,47(2):251-263

A set of longitudinal binary, partially incomplete, data on obesity among children in the USA is reanalysed. The multivariate Bernoulli distribution is parameterized by the univariate marginal probabilities and dependence ratios of all orders, which together support maximum likelihood inference. The temporal association of obesity is strong and complex but stationary. We fit a saturated model for the distribution of response patterns and find that non-response is missing completely at random for boys but that the probability of obesity is consistently higher among girls who provided incomplete records than among girls who provided complete records. We discuss the statistical and substantive features of, respectively, pattern mixture and selection models for this data set. 相似文献

11.

A monte carlo comparison of the smoothing,scoring and em algorithms for dispersion matrix estimation with incomplete growth curve data

《Journal of Statistical Computation and Simulation》2012,82(1-2):77-92

Incomplete growth curve data often result from missing or mistimed observations in a repeated measures design. Virtually all methods of analysis rely on the dispersion matrix estimates. A Monte Carlo simulation was used to compare three methods of estimation of dispersion matrices for incomplete growth curve data. The three methods were: 1) maximum likelihood estimation with a smoothing algorithm, which finds the closest positive semidefinite estimate of the pairwise estimated dispersion matrix; 2) a mixed effects model using the EM (estimation maximization) algorithm; and 3) a mixed effects model with the scoring algorithm. The simulation included 5 dispersion structures, 20 or 40 subjects with 4 or 8 observations per subject and 10 or 30% missing data. In all the simulations, the smoothing algorithm was the poorest estimator of the dispersion matrix. In most cases, there were no significant differences between the scoring and EM algorithms. The EM algorithm tended to be better than the scoring algorithm when the variances of the random effects were close to zero, especially for the simulations with 4 observations per subject and two random effects. 相似文献

12.

Impact of the non-distinctness and non-ignorability on the inference by multiple imputation in multivariate multilevel data: a simulation assessment

Recai Yucel 《Journal of Statistical Computation and Simulation》2017,87(9):1813-1826

Multiple imputation (MI) is an increasingly popular method for analysing incomplete multivariate data sets. One of the most crucial assumptions of this method relates to mechanism leading to missing data. Distinctness is typically assumed, which indicates a complete independence of mechanisms underlying missingness and data generation. In addition, missing at random or missing completely at random is assumed, which explicitly states under which conditions missingness is independent of observed data. Despite common use of MI under these assumptions, plausibility and sensitivity to these fundamental assumptions have not been well-investigated. In this work, we investigate the impact of non-distinctness and non-ignorability. In particular, non-ignorability is due to unobservable cluster-specific effects (e.g. random-effects). Through a comprehensive simulation study, we show that MI inferences suggest that nonignoriability due to non-distinctness do not immediately imply dismal performance while non-ignorability due to missing not at random leads to quite subpar performance. 相似文献

13.

Bias of factor loadings from questionnaire data with imputed scores

《Journal of Statistical Computation and Simulation》2012,82(1):13-23

This study investigated the bias of factor loadings obtained from incomplete questionnaire data with imputed scores. Three models were used to generate discrete ordered rating scale data typical of questionnaires, also known as Likert data. These methods were the multidimensional polytomous latent trait model, a normal ogive item response theory model, and the discretized normal model. Incomplete data due to nonresponse were simulated using either missing completely at random or not missing at random mechanisms. Subsequently, for each incomplete data matrix, four imputation methods were applied for imputing item scores. Based on a completely crossed six-factor design, it was concluded that in general, bias was small for all data simulation methods and all imputation methods, and under all nonresponse mechanisms. Imputation method, two-way-plus-error, had the smallest bias in the factor loadings. Bias based on the discretized normal model was greater than that based on the other two models. 相似文献

14.

Model selection of generalized partially linear models with missing covariates

Ying-Zi FuXue-Dong Chen 《Journal of statistical planning and inference》2012,142(1):126-138

In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology. 相似文献

15.

Comparison of methods for incomplete repeated measures data analysis in small samples

《Journal of statistical planning and inference》2006,136(1):235-247

This paper presents missing data methods for repeated measures data in small samples. Most methods currently available are for large samples. In particular, no studies have compared the performance of multiple imputation methods to that of non-imputation incomplete analysis methods. We first develop a strategy for multiple imputations for repeated measures data under a cell-means model that is applicable for any multivariate data with small samples. Multiple imputation inference procedures are applied to the resulting multiply imputed complete data sets. Comparisons to other available non-imputation incomplete data methods is made via simulation studies to conclude that there is not much gain in using the computer intensive multiple imputation methods for small sample repeated measures data analysis in terms of the power of testing hypotheses of parameters of interest. 相似文献

16.

Skew-mixed effects model for multivariate longitudinal data with categorical outcomes and missingness

S. Eftekhari Mahabadi E. Rahimi Jafari 《Journal of applied statistics》2018,45(12):2182-2201

A longitudinal study commonly follows a set of variables, measured for each individual repeatedly over time, and usually suffers from incomplete data problem. A common approach for dealing with longitudinal categorical responses is to use the Generalized Linear Mixed Model (GLMM). This model induces the potential relation between response variables over time via a vector of random effects, assumed to be shared parameters in the non-ignorable missing mechanism. Most GLMMs assume that the random-effects parameters follow a normal or symmetric distribution and this leads to serious problems in real applications. In this paper, we propose GLMMs for the analysis of incomplete multivariate longitudinal categorical responses with a non-ignorable missing mechanism based on a shared parameter framework with the less restrictive assumption of skew-normality for the random effects. These models may contain incomplete data with monotone and non-monotone missing patterns. The performance of the model is evaluated using simulation studies and a well-known longitudinal data set extracted from a fluvoxamine trial is analyzed to determine the profile of fluvoxamine in ambulatory clinical psychiatric practice. 相似文献

17.

Variance component models for longitudinal count data with baseline information: epilepsy data revisited

Marco Alfò Murray Aitkin 《Statistics and Computing》2006,16(3):231-238

Random effect models have often been used in longitudinal data analysis since they allow for association among repeated measurements due to unobserved heterogeneity. Various approaches have been proposed to extend mixed models for repeated count data to include dependence on baseline counts. Dependence between baseline counts and individual-specific random effects result in a complex form of the (conditional) likelihood. An approximate solution can be achieved ignoring this dependence, but this approach could result in biased parameter estimates and in wrong inferences. We propose a computationally feasible approach to overcome this problem, leaving the random effect distribution unspecified. In this context, we show how the EM algorithm for nonparametric maximum likelihood (NPML) can be extended to deal with dependence of repeated measures on baseline counts. 相似文献

18.

Estimation in large cross random-effect models by data augmentation

D. Clayton & J. Rasbash 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》1999,162(3):425-436

Estimation in mixed linear models is, in general, computationally demanding, since applied problems may involve extensive data sets and large numbers of random effects. Existing computer algorithms are slow and/or require large amounts of memory. These problems are compounded in generalized linear mixed models for categorical data, since even approximate methods involve fitting of a linear mixed model within steps of an iteratively reweighted least squares algorithm. Only in models in which the random effects are hierarchically nested can the computations for fitting these models to large data sets be carried out rapidly. We describe a data augmentation approach to these computational difficulties in which we repeatedly fit an overlapping series of submodels, incorporating the missing terms in each submodel as 'offsets'. The submodels are chosen so that they have a nested random-effect structure, thus allowing maximum exploitation of the computational efficiency which is available in this case. Examples of the use of the algorithm for both metric and discrete responses are discussed, all calculations being carried out using macros within the MLwiN program. 相似文献

19.

Bayesian quantile regression for skew-normal linear mixed models

A. Aghamohammadi M. R. Meshkani 《统计学通讯:理论与方法》2017,46(22):10953-10972

Linear mixed models have been widely used to analyze repeated measures data which arise in many studies. In most applications, it is assumed that both the random effects and the within-subjects errors are normally distributed. This can be extremely restrictive, obscuring important features of within-and among-subject variations. Here, quantile regression in the Bayesian framework for the linear mixed models is described to carry out the robust inferences. We also relax the normality assumption for the random effects by using a multivariate skew-normal distribution, which includes the normal ones as a special case and provides robust estimation in the linear mixed models. For posterior inference, we propose a Gibbs sampling algorithm based on a mixture representation of the asymmetric Laplace distribution and multivariate skew-normal distribution. The procedures are demonstrated by both simulated and real data examples. 相似文献

20.

Learning causal structure from mixed data with missing values using Gaussian copula models

Cui Ruifei Groot Perry Heskes Tom 《Statistics and Computing》2019,29(2):311-333

We consider the problem of causal structure learning from data with missing values, assumed to be drawn from a Gaussian copula model. First, we extend the ‘Rank PC’ algorithm, designed for Gaussian copula models with purely continuous data (so-called nonparanormal models), to incomplete data by applying rank correlation to pairwise complete observations and replacing the sample size with an effective sample size in the conditional independence tests to account for the information loss from missing values. When the data are missing completely at random (MCAR), we provide an error bound on the accuracy of ‘Rank PC’ and show its high-dimensional consistency. However, when the data are missing at random (MAR), ‘Rank PC’ fails dramatically. Therefore, we propose a Gibbs sampling procedure to draw correlation matrix samples from mixed data that still works correctly under MAR. These samples are translated into an average correlation matrix and an effective sample size, resulting in the ‘Copula PC’ algorithm for incomplete data. Simulation study shows that: (1) ‘Copula PC’ estimates a more accurate correlation matrix and causal structure than ‘Rank PC’ under MCAR and, even more so, under MAR and (2) the usage of the effective sample size significantly improves the performance of ‘Rank PC’ and ‘Copula PC.’ We illustrate our methods on two real-world datasets: riboflavin production data and chronic fatigue syndrome data.

相似文献