期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multiple imputation for ordinal longitudinal data with monotone missing data patterns

A.Y. Kombo H. Mwambi G. Molenberghs 《Journal of applied statistics》2017,44(2):270-287

Missing data often complicate the analysis of scientific data. Multiple imputation is a general purpose technique for analysis of datasets with missing values. The approach is applicable to a variety of missing data patterns but often complicated by some restrictions like the type of variables to be imputed and the mechanism underlying the missing data. In this paper, the authors compare the performance of two multiple imputation methods, namely fully conditional specification and multivariate normal imputation in the presence of ordinal outcomes with monotone missing data patterns. Through a simulation study and an empirical example, the authors show that the two methods are indeed comparable meaning any of the two may be used when faced with scenarios, at least, as the ones presented here. 相似文献

2.

A structured approach to choosing estimands and estimators in longitudinal clinical trials

C. H. Mallinckrodt Q. Lin I. Lipkovich G. Molenberghs 《Pharmaceutical statistics》2012,11(6):456-461

An important evolution in the missing data arena has been the recognition of need for clarity in objectives. The objectives of primary focus in clinical trials can often be categorized as assessing efficacy or effectiveness. The present investigation illustrated a structured framework for choosing estimands and estimators when testing investigational drugs to treat the symptoms of chronic illnesses. Key issues were discussed and illustrated using a reanalysis of the confirmatory trials from a new drug application in depression. The primary analysis used a likelihood‐based approach to assess efficacy: mean change to the planned endpoint of the trial assuming patients stayed on drug. Secondarily, effectiveness was assessed using a multiple imputation approach. The imputation model—derived solely from the placebo group—was used to impute missing values for both the drug and placebo groups. Therefore, this so‐called placebo multiple imputation (a.k.a. controlled imputation) approach assumed patients had reduced benefit from the drug after discontinuing it. Results from the example data provided clear evidence of efficacy for the experimental drug and characterized its effectiveness. Data after discontinuation of study medication were not required for these analyses. Given the idiosyncratic nature of drug development, no estimand or approach is universally appropriate. However, the general practice of pairing efficacy and effectiveness estimands may often be useful in understanding the overall risks and benefits of a drug. Controlled imputation approaches, such as placebo multiple imputation, can be a flexible and transparent framework for formulating primary analyses of effectiveness estimands and sensitivity analyses for efficacy estimands. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

3.

Multidimensional contingency tables with missing data

Zhi Geng 《统计学通讯:理论与方法》2013,42(12):4137-4146

This paper is concerned wim ine maximum likelihood estimation and the likelihood ratio test for hierarchical loglinear models of multidimensional contingency tables with missing data. The problems of estimation and test for a high dimensional contingency table can be reduced into those for a class of low dimensional tables. In some cases, the incomplete data in the high dimensional table can become complete in the low dimensional tables through the reduction can indicate how much the incomplete data contribute to the estimation and the test. 相似文献

4.

Using multiple imputation to estimate cumulative distribution functions in longitudinal data analysis with data missing at random

Phillip Dinh 《Pharmaceutical statistics》2013,12(5):260-267

In longitudinal clinical studies, after randomization at baseline, subjects are followed for a period of time for development of symptoms. The interested inference could be the mean change from baseline to a particular visit in some lab values, the proportion of responders to some threshold category at a particular visit post baseline, or the time to some important event. However, in some applications, the interest may be in estimating the cumulative distribution function (CDF) at a fixed time point post baseline. When the data are fully observed, the CDF can be estimated by the empirical CDF. When patients discontinue prematurely during the course of the study, the empirical CDF cannot be directly used. In this paper, we use multiple imputation as a way to estimate the CDF in longitudinal studies when data are missing at random. The validity of the method is assessed on the basis of the bias and the Kolmogorov–Smirnov distance. The results suggest that multiple imputation yields less bias and less variability than the often used last observation carried forward method. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

5.

Review of guidelines and literature for handling missing data in longitudinal clinical trials with a case study 总被引：1，自引：0，他引：1

Liu M Wei L Zhang J 《Pharmaceutical statistics》2006,5(1):7-18

Missing data in clinical trials are inevitable. We highlight the ICH guidelines and CPMP points to consider on missing data. Specifically, we outline how we should consider missing data issues when designing, planning and conducting studies to minimize missing data impact. We also go beyond the coverage of the above two documents, provide a more detailed review of the basic concepts of missing data and frequently used terminologies, and examples of the typical missing data mechanism, and discuss technical details and literature for several frequently used statistical methods and associated software. Finally, we provide a case study where the principles outlined in this paper are applied to one clinical program at protocol design, data analysis plan and other stages of a clinical trial. 相似文献

6.

Missing data techniques for multilevel data: implications of model misspecification

Anne C. Black Ofer Harel D. Betsy McCoach 《Journal of applied statistics》2011,38(9):1845-1865

When modeling multilevel data, it is important to accurately represent the interdependence of observations within clusters. Ignoring data clustering may result in parameter misestimation. However, it is not well established to what degree parameter estimates are affected by model misspecification when applying missing data techniques (MDTs) to incomplete multilevel data. We compare the performance of three MDTs with incomplete hierarchical data. We consider the impact of imputation model misspecification on the quality of parameter estimates by employing multiple imputation under assumptions of a normal model (MI/NM) with two-level cross-sectional data when values are missing at random on the dependent variable at rates of 10%, 30%, and 50%. Five criteria are used to compare estimates from MI/NM to estimates from MI assuming a linear mixed model (MI/LMM) and maximum likelihood estimation to the same incomplete data sets. With 10% missing data (MD), techniques performed similarly for fixed-effects estimates, but variance components were biased with MI/NM. Effects of model misspecification worsened at higher rates of MD, with the hierarchical structure of the data markedly underrepresented by biased variance component estimates. MI/LMM and maximum likelihood provided generally accurate and unbiased parameter estimates but performance was negatively affected by increased rates of MD. 相似文献

7.

Multiple imputation of missing data with ante-dependence covariance structure

Paul Zhang 《Journal of applied statistics》2005,32(2):141-155

A controlled clinical trial was conducted to investigate the efficacy effect of a chemical compound in the treatment of Premenstrual Dysphoric Disorder (PMDD). The data from the trial showed a non-monotone pattern of missing data and an ante-dependence covariance structure. A new analytical method for imputing the missing data with the ante-dependence covariance is proposed. The PMDD data are analysed by the non-imputation method and two imputation methods: the proposed method and the MCMC method. 相似文献

8.

Semi-empirical likelihood inference for the ROC curve with missing data

Xiaoxia Liu Yichuan Zhao 《Journal of statistical planning and inference》2012

The receiver operating characteristic (ROC) curve is one of the most commonly used methods to compare the diagnostic performance of two or more laboratory or diagnostic tests. In this paper, we propose semi-empirical likelihood based confidence intervals for ROC curves of two populations, where one population is parametric and the other one is non-parametric and both have missing data. After imputing missing values, we derive the semi-empirical likelihood ratio statistic and the corresponding likelihood equations. It is shown that the log-semi-empirical likelihood ratio statistic is asymptotically scaled chi-squared. The estimating equations are solved simultaneously to obtain the estimated lower and upper bounds of semi-empirical likelihood confidence intervals. We conduct extensive simulation studies to evaluate the finite sample performance of the proposed empirical likelihood confidence intervals with various sample sizes and different missing probabilities. 相似文献

9.

Comparison of several imputation methods for missing baseline data in propensity scores analysis of binary outcome

Brenda J. Crowe Ilya A. Lipkovich Ouhong Wang 《Pharmaceutical statistics》2010,9(4):269-279

We performed a simulation study comparing the statistical properties of the estimated log odds ratio from propensity scores analyses of a binary response variable, in which missing baseline data had been imputed using a simple imputation scheme (Treatment Mean Imputation), compared with three ways of performing multiple imputation (MI) and with a Complete Case analysis. MI that included treatment (treated/untreated) and outcome (for our analyses, outcome was adverse event [yes/no]) in the imputer's model had the best statistical properties of the imputation schemes we studied. MI is feasible to use in situations where one has just a few outcomes to analyze. We also found that Treatment Mean Imputation performed quite well and is a reasonable alternative to MI in situations where it is not feasible to use MI. Treatment Mean Imputation performed better than MI methods that did not include both the treatment and outcome in the imputer's model. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

10.

Latent class models for longitudinal studies of the elderly with data missing at random

Beth A Reboussin Michael E Miller Kurt K Lohman & Thomas R Ten Have 《Journal of the Royal Statistical Society. Series C, Applied statistics》2002,51(1):69-90

The elderly population in the USA is expected to double in size by the year 2025, making longitudinal health studies of this population of increasing importance. The degree of loss to follow-up in studies of the elderly, which is often because elderly people cannot remain in the study, enter a nursing home or die, make longitudinal studies of this population problematic. We propose a latent class model for analysing multiple longitudinal binary health outcomes with multiple-cause non-response when the data are missing at random and a non-likelihood-based analysis is performed. We extend the estimating equations approach of Robins and co-workers to latent class models by reweighting the multiple binary longitudinal outcomes by the inverse probability of being observed. This results in consistent parameter estimates when the probability of non-response depends on observed outcomes and covariates (missing at random) assuming that the model for non-response is correctly specified. We extend the non-response model so that institutionalization, death and missingness due to failure to locate, refusal or incomplete data each have their own set of non-response probabilities. Robust variance estimates are derived which account for the use of a possibly misspecified covariance matrix, estimation of missing data weights and estimation of latent class measurement parameters. This approach is then applied to a study of lower body function among a subsample of the elderly participating in the 6-year Longitudinal Study of Aging. 相似文献

11.

MODEL SELECTION FOR PENALIZED SPLINE SMOOTHING USING AKAIKE INFORMATION CRITERIA

Carrie Wager Florin Vaida Göran Kauermann 《Australian & New Zealand Journal of Statistics》2007,49(2):173-190

Two different forms of Akaike's information criterion (AIC) are compared for selecting the smooth terms in penalized spline additive mixed models. The conditional AIC (cAIC) has been used traditionally as a criterion for both estimating penalty parameters and selecting covariates in smoothing, and is based on the conditional likelihood given the smooth mean and on the effective degrees of freedom for a model fit. By comparison, the marginal AIC (mAIC) is based on the marginal likelihood from the mixed‐model formulation of penalized splines which has recently become popular for estimating smoothing parameters. To the best of the authors' knowledge, the use of mAIC for selecting covariates for smoothing in additive models is new. In the competing models considered for selection, covariates may have a nonlinear effect on the response, with the possibility of group‐specific curves. Simulations are used to compare the performance of cAIC and mAIC in model selection settings that have correlated and hierarchical smooth terms. In moderately large samples, both formulations of AIC perform extremely well at detecting the function that generated the data. The mAIC does better for simple functions, whereas the cAIC is more sensitive to detecting a true model that has complex and hierarchical terms. 相似文献

12.

Distribution estimation with auxiliary information for missing data

Xu Liu Peixin LiuYong Zhou 《Journal of statistical planning and inference》2011,141(2):711-724

There is much literature on statistical inference for distribution under missing data, but surprisingly very little previous attention has been paid to missing data in the context of estimating distribution with auxiliary information. In this article, the auxiliary information with missing data is proposed. We use Zhou, Wan and Wang's method (2008) to mitigate the effects of missing data through a reformulation of the estimating equations, imputed through a semi-parametric procedure. Whence we can estimate distribution and the τth quantile of the distribution by taking auxiliary information into account. Asymptotic properties of the distribution estimator and corresponding sample quantile are derived and analyzed. The distribution estimators based on our method are found to significantly outperform the corresponding estimators without auxiliary information. Some simulation studies are conducted to illustrate the finite sample performance of the proposed estimators. 相似文献

13.

Modern variable selection for longitudinal semi-parametric models with missing data

J. Kowalski S. Hao T. Chen Y. Liang J. Liu L. Ge 《Journal of applied statistics》2018,45(14):2548-2562

Penalized methods for variable selection such as the Smoothly Clipped Absolute Deviation penalty have been increasingly applied to aid variable section in regression analysis. Much of the literature has focused on parametric models, while a few recent studies have shifted the focus and developed their applications for the popular semi-parametric, or distribution-free, generalized estimating equations (GEEs) and weighted GEE (WGEE). However, although the WGEE is composed of one main and one missing-data module, available methods only focus on the main module, with no variable selection for the missing-data module. In this paper, we develop a new approach to further extend the existing methods to enable variable selection for both modules. The approach is illustrated by both real and simulated study data. 相似文献

14.

Analysis of longitudinal data with non-ignorable non-monotone missing values 总被引：2，自引：0，他引：2

A. B. Troxel D. P. Harrington & S. R. Lipsitz 《Journal of the Royal Statistical Society. Series C, Applied statistics》1998,47(3):425-438

A full likelihood method is proposed to analyse continuous longitudinal data with non-ignorable (informative) missing values and non-monotone patterns. The problem arose in a breast cancer clinical trial where repeated assessments of quality of life were collected: patients rated their coping ability during and after treatment. We allow the missingness probabilities to depend on unobserved responses, and we use a multivariate normal model for the outcomes. A first-order Markov dependence structure for the responses is a natural choice and facilitates the construction of the likelihood; estimates are obtained via the Nelder–Mead simplex algorithm. Computations are difficult and become intractable with more than three or four assessments. Applying the method to the quality-of-life data results in easily interpretable estimates, confirms the suspicion that the data are non-ignorably missing and highlights the likely bias of standard methods. Although treatment comparisons are not affected here, the methods are useful for obtaining unbiased means and estimating trends over time. 相似文献

15.

Missing data methods for arbitrary missingness with small samples

Daniel McNeish 《Journal of applied statistics》2017,44(1):24-39

Missing data are a prevalent and widespread data analytic issue and previous studies have performed simulations to compare the performance of missing data methods in various contexts and for various models; however, one such context that has yet to receive much attention in the literature is the handling of missing data with small samples, particularly when the missingness is arbitrary. Prior studies have either compared methods for small samples with monotone missingness commonly found in longitudinal studies or have investigated the performance of a single method to handle arbitrary missingness with small samples but studies have yet to compare the relative performance of commonly implemented missing data methods for small samples with arbitrary missingness. This study conducts a simulation study to compare and assess the small sample performance of maximum likelihood, listwise deletion, joint multiple imputation, and fully conditional specification multiple imputation for a single-level regression model with a continuous outcome. Results showed that, provided assumptions are met, joint multiple imputation unanimously performed best of the methods examined in the conditions under study. 相似文献

16.

Inference methods for saturated models in longitudinal clinical trials with incomplete binary data

Song JX 《Pharmaceutical statistics》2006,5(4):295-304

In the longitudinal studies with binary response, it is often of interest to estimate the percentage of positive responses at each time point and the percentage of having at least one positive response by each time point. When missing data exist, the conventional method based on observed percentages could result in erroneous estimates. This study demonstrates two methods of using expectation-maximization (EM) and data augmentation (DA) algorithms in the estimation of the marginal and cumulative probabilities for incomplete longitudinal binary response data. Both methods provide unbiased estimates when the missingness mechanism is missing at random (MAR) assumption. Sensitivity analyses have been performed for cases when the MAR assumption is in question. 相似文献

17.

Empirical likelihood-based inference in nonlinear regression models with missing responses at random

Nian-Sheng Tang Pu-Ying Zhao 《Statistics》2013,47(6):1141-1159

This paper investigates the estimations of regression parameters and response mean in nonlinear regression models in the presence of missing response variables that are missing with missingness probabilities depending on covariates. We propose four empirical likelihood (EL)-based estimators for the regression parameters and the response mean. The resulting estimators are shown to be consistent and asymptotically normal under some general assumptions. To construct the confidence regions for the regression parameters as well as the response mean, we develop four EL ratio statistics, which are proven to have the χ² distribution asymptotically. Simulation studies and an artificial data set are used to illustrate the proposed methodologies. Empirical results show that the EL method behaves better than the normal approximation method and that the coverage probabilities and average lengths depend on the selection probability function. 相似文献

18.

General location multivariate latent variable models for mixed correlated bounded continuous,ordinal, and nominal responses with non-ignorable missing data

Elham Tabrizi Ehsan Bahrami Samani Mojtaba Ganjali 《Journal of applied statistics》2021,48(5):765

Using a multivariate latent variable approach, this article proposes some new general models to analyze the correlated bounded continuous and categorical (nominal or/and ordinal) responses with and without non-ignorable missing values. First, we discuss regression methods for jointly analyzing continuous, nominal, and ordinal responses that we motivated by analyzing data from studies of toxicity development. Second, using the beta and Dirichlet distributions, we extend the models so that some bounded continuous responses are replaced for continuous responses. The joint distribution of the bounded continuous, nominal and ordinal variables is decomposed into a marginal multinomial distribution for the nominal variable and a conditional multivariate joint distribution for the bounded continuous and ordinal variables given the nominal variable. We estimate the regression parameters under the new general location models using the maximum-likelihood method. Sensitivity analysis is also performed to study the influence of small perturbations of the parameters of the missing mechanisms of the model on the maximal normal curvature. The proposed models are applied to two data sets: BMI, Steatosis and Osteoporosis data and Tehran household expenditure budgets. 相似文献

19.

A gradient-based algorithm for semiparametric models with missing covariates

《Journal of Statistical Computation and Simulation》2012,82(4):381-390

In the parametric regression model, the covariate missing problem under missing at random is considered. It is often desirable to use flexible parametric or semiparametric models for the covariate distribution, which can reduce a potential misspecification problem. Recently, a completely nonparametric approach was developed by [H.Y. Chen, Nonparametric and semiparametric models for missing covariates in parameter regression, J. Amer. Statist. Assoc. 99 (2004), pp. 1176–1189; Z. Zhang and H.E. Rockette, On maximum likelihood estimation in parametric regression with missing covariates, J. Statist. Plann. Inference 47 (2005), pp. 206–223]. Although it does not require a model for the covariate distribution or the missing data mechanism, the proposed method assumes that the covariate distribution is supported only by observed values. Consequently, their estimator is a restricted maximum likelihood estimator (MLE) rather than the global MLE. In this article, we show the restricted semiparametric MLE could be very misleading in some cases. We discuss why this problem occurs and suggest an algorithm to obtain the global MLE. Then, we assess the performance of the proposed method via some simulation experiments. 相似文献

20.

Reducing bias for maximum approximate conditional likelihood estimator with general missing data mechanism

Jiwei Zhao 《Journal of nonparametric statistics》2017,29(3):577-593

相似文献