期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multiple imputation: an alternative to top coding for statistical disclosure control

Di An Roderick J. A. Little 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2007,170(4):923-940

Summary. Top coding of extreme values of variables like income is a common method of statistical disclosure control, but it creates problems for the data analyst. The paper proposes two alternative methods to top coding for statistical disclosure control that are based on multiple imputation. We show in simulation studies that the multiple-imputation methods provide better inferences of the publicly released data than top coding, using straightforward multiple-imputation methods of analysis, while maintaining good statistical disclosure control properties. We illustrate the methods on data from the 1995 Chinese household income project. 相似文献

2.

A multiple imputation approach to nonlinear mixed-effects models with covariate measurement errors and missing values

Wei Liu Shuyou Li 《Journal of applied statistics》2015,42(3):463-476

In longitudinal studies, nonlinear mixed-effects models have been widely applied to describe the intra- and the inter-subject variations in data. The inter-subject variation usually receives great attention and it may be partially explained by time-dependent covariates. However, some covariates may be measured with substantial errors and may contain missing values. We proposed a multiple imputation method, implemented by a Markov Chain Monte-Carlo method along with Gibbs sampler, to address the covariate measurement errors and missing data in nonlinear mixed-effects models. The multiple imputation method is illustrated in a real data example. Simulation studies show that the multiple imputation method outperforms the commonly used naive methods. 相似文献

3.

Missing data: Discussion points from the PSI missing data expert group

Tomasz Burzykowski James Carpenter Corneel Coens Daniel Evans Lesley France Mike Kenward Peter Lane James Matcham David Morgan Alan Phillips James Roger Brian Sullivan Ian White Ly‐Mee Yu of the PSI Missing Data Expert Group 《Pharmaceutical statistics》2010,9(4):288-297

The Points to Consider Document on Missing Data was adopted by the Committee of Health and Medicinal Products (CHMP) in December 2001. In September 2007 the CHMP issued a recommendation to review the document, with particular emphasis on summarizing and critically appraising the pattern of drop‐outs, explaining the role and limitations of the ‘last observation carried forward’ method and describing the CHMP's cautionary stance on the use of mixed models. In preparation for the release of the updated guidance document, statisticians in the Pharmaceutical Industry held a one‐day expert group meeting in September 2008. Topics that were debated included minimizing the extent of missing data and understanding the missing data mechanism, defining the principles for handling missing data and understanding the assumptions underlying different analysis methods. A clear message from the meeting was that at present, biostatisticians tend only to react to missing data. Limited pro‐active planning is undertaken when designing clinical trials. Missing data mechanisms for a trial need to be considered during the planning phase and the impact on the objectives assessed. Another area for improvement is in the understanding of the pattern of missing data observed during a trial and thus the missing data mechanism via the plotting of data; for example, use of Kaplan–Meier curves looking at time to withdrawal. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

4.

A comparison of various software tools for dealing with missing data via imputation

《Journal of Statistical Computation and Simulation》2012,82(11):1653-1675

In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect. 相似文献

5.

Missing data sensitivity analysis for recurrent event data using controlled imputation

Oliver N. Keene James H. Roger Benjamin F. Hartley Michael G. Kenward 《Pharmaceutical statistics》2014,13(4):258-264

Statistical analyses of recurrent event data have typically been based on the missing at random assumption. One implication of this is that, if data are collected only when patients are on their randomized treatment, the resulting de jure estimator of treatment effect corresponds to the situation in which the patients adhere to this regime throughout the study. For confirmatory analysis of clinical trials, sensitivity analyses are required to investigate alternative de facto estimands that depart from this assumption. Recent publications have described the use of multiple imputation methods based on pattern mixture models for continuous outcomes, where imputation for the missing data for one treatment arm (e.g. the active arm) is based on the statistical behaviour of outcomes in another arm (e.g. the placebo arm). This has been referred to as controlled imputation or reference‐based imputation. In this paper, we use the negative multinomial distribution to apply this approach to analyses of recurrent events and other similar outcomes. The methods are illustrated by a trial in severe asthma where the primary endpoint was rate of exacerbations and the primary analysis was based on the negative binomial model. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

6.

Examining the robustness of fully synthetic data techniques for data with binary variables

《Journal of Statistical Computation and Simulation》2012,82(6):609-624

There is a growing demand for public use data while at the same time there are increasing concerns about the privacy of personal information. One proposed method for accomplishing both goals is to release data sets that do not contain real values but yield the same inferences as the actual data. The idea is to view confidential data as missing and use multiple imputation techniques to create synthetic data sets. In this article, we compare techniques for creating synthetic data sets in simple scenarios with a binary variable. 相似文献

7.

Control‐based imputation for sensitivity analyses in informative censoring for recurrent event data

下载免费PDF全文

Fei Gao Guanghan F. Liu Donglin Zeng Lei Xu Bridget Lin Guoqing Diao Gregory Golm Joseph F. Heyse Joseph G. Ibrahim 《Pharmaceutical statistics》2017,16(6):424-432

In clinical trials, missing data commonly arise through nonadherence to the randomized treatment or to study procedure. For trials in which recurrent event endpoints are of interests, conventional analyses using the proportional intensity model or the count model assume that the data are missing at random, which cannot be tested using the observed data alone. Thus, sensitivity analyses are recommended. We implement the control‐based multiple imputation as sensitivity analyses for the recurrent event data. We model the recurrent event using a piecewise exponential proportional intensity model with frailty and sample the parameters from the posterior distribution. We impute the number of events after dropped out and correct the variance estimation using a bootstrap procedure. We apply the method to an application of sitagliptin study. 相似文献

8.

Missing data techniques for multilevel data: implications of model misspecification

Anne C. Black Ofer Harel D. Betsy McCoach 《Journal of applied statistics》2011,38(9):1845-1865

When modeling multilevel data, it is important to accurately represent the interdependence of observations within clusters. Ignoring data clustering may result in parameter misestimation. However, it is not well established to what degree parameter estimates are affected by model misspecification when applying missing data techniques (MDTs) to incomplete multilevel data. We compare the performance of three MDTs with incomplete hierarchical data. We consider the impact of imputation model misspecification on the quality of parameter estimates by employing multiple imputation under assumptions of a normal model (MI/NM) with two-level cross-sectional data when values are missing at random on the dependent variable at rates of 10%, 30%, and 50%. Five criteria are used to compare estimates from MI/NM to estimates from MI assuming a linear mixed model (MI/LMM) and maximum likelihood estimation to the same incomplete data sets. With 10% missing data (MD), techniques performed similarly for fixed-effects estimates, but variance components were biased with MI/NM. Effects of model misspecification worsened at higher rates of MD, with the hierarchical structure of the data markedly underrepresented by biased variance component estimates. MI/LMM and maximum likelihood provided generally accurate and unbiased parameter estimates but performance was negatively affected by increased rates of MD. 相似文献

9.

基于数据缺失率和缺失模式的多重插补误差研究

彭海艳李意芝孟利军《统计与决策》2022,(1)

文章通过多重插补方法对不同缺失率和缺失模式的多变量缺失样本进行插补,研究了多重插补误差与缺失率和缺失模式的依赖关系。结果表明,当缺失率为0~15%时,多重插补误差与缺失率呈线性关系;当缺失率大于15%时,两者呈偏离线性关系。多重插补误差与缺失模式的方差均值比呈正相关性,当方差均值比越大时,误差也越大。相似文献

10.

Using multiple imputation to estimate cumulative distribution functions in longitudinal data analysis with data missing at random

Phillip Dinh 《Pharmaceutical statistics》2013,12(5):260-267

In longitudinal clinical studies, after randomization at baseline, subjects are followed for a period of time for development of symptoms. The interested inference could be the mean change from baseline to a particular visit in some lab values, the proportion of responders to some threshold category at a particular visit post baseline, or the time to some important event. However, in some applications, the interest may be in estimating the cumulative distribution function (CDF) at a fixed time point post baseline. When the data are fully observed, the CDF can be estimated by the empirical CDF. When patients discontinue prematurely during the course of the study, the empirical CDF cannot be directly used. In this paper, we use multiple imputation as a way to estimate the CDF in longitudinal studies when data are missing at random. The validity of the method is assessed on the basis of the bias and the Kolmogorov–Smirnov distance. The results suggest that multiple imputation yields less bias and less variability than the often used last observation carried forward method. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

11.

Sensitivity to imputation models and assumptions in receiver operating characteristic analysis with incomplete data

《Journal of Statistical Computation and Simulation》2012,82(17):3498-3511

Modern statistical methods using incomplete data have been increasingly applied in a wide variety of substantive problems. Similarly, receiver operating characteristic (ROC) analysis, a method used in evaluating diagnostic tests or biomarkers in medical research, has also been increasingly popular problem in both its development and application. While missing-data methods have been applied in ROC analysis, the impact of model mis-specification and/or assumptions (e.g. missing at random) underlying the missing data has not been thoroughly studied. In this work, we study the performance of multiple imputation (MI) inference in ROC analysis. Particularly, we investigate parametric and non-parametric techniques for MI inference under common missingness mechanisms. Depending on the coherency of the imputation model with the underlying data generation mechanism, our results show that MI generally leads to well-calibrated inferences under ignorable missingness mechanisms. 相似文献

12.

Multiple Imputation for the Cox Proportional Hazards Model with Missing Covariates

Paik Myunghee Cho 《Lifetime data analysis》1997,3(3):289-298

We present three multiple imputation estimates for the Cox model with missing covariates. Two of the suggested estimates are asymptotically equivalent to estimates in the literature when the number of multiple imputations approaches infinity. The third estimate can be implemented using standard software that could handle time-varying covariates. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

13.

Reference‐based sensitivity analysis for time‐to‐event data

Andrew Atkinson Michael G. Kenward Tim Clayton James R. Carpenter 《Pharmaceutical statistics》2019,18(6):645-658

The analysis of time‐to‐event data typically makes the censoring at random assumption, ie, that—conditional on covariates in the model—the distribution of event times is the same, whether they are observed or unobserved (ie, right censored). When patients who remain in follow‐up stay on their assigned treatment, then analysis under this assumption broadly addresses the de jure, or “while on treatment strategy” estimand. In such cases, we may well wish to explore the robustness of our inference to more pragmatic, de facto or “treatment policy strategy,” assumptions about the behaviour of patients post‐censoring. This is particularly the case when censoring occurs because patients change, or revert, to the usual (ie, reference) standard of care. Recent work has shown how such questions can be addressed for trials with continuous outcome data and longitudinal follow‐up, using reference‐based multiple imputation. For example, patients in the active arm may have their missing data imputed assuming they reverted to the control (ie, reference) intervention on withdrawal. Reference‐based imputation has two advantages: (a) it avoids the user specifying numerous parameters describing the distribution of patients' postwithdrawal data and (b) it is, to a good approximation, information anchored, so that the proportion of information lost due to missing data under the primary analysis is held constant across the sensitivity analyses. In this article, we build on recent work in the survival context, proposing a class of reference‐based assumptions appropriate for time‐to‐event data. We report a simulation study exploring the extent to which the multiple imputation estimator (using Rubin's variance formula) is information anchored in this setting and then illustrate the approach by reanalysing data from a randomized trial, which compared medical therapy with angioplasty for patients presenting with angina. 相似文献

14.

A structured approach to choosing estimands and estimators in longitudinal clinical trials

C. H. Mallinckrodt Q. Lin I. Lipkovich G. Molenberghs 《Pharmaceutical statistics》2012,11(6):456-461

An important evolution in the missing data arena has been the recognition of need for clarity in objectives. The objectives of primary focus in clinical trials can often be categorized as assessing efficacy or effectiveness. The present investigation illustrated a structured framework for choosing estimands and estimators when testing investigational drugs to treat the symptoms of chronic illnesses. Key issues were discussed and illustrated using a reanalysis of the confirmatory trials from a new drug application in depression. The primary analysis used a likelihood‐based approach to assess efficacy: mean change to the planned endpoint of the trial assuming patients stayed on drug. Secondarily, effectiveness was assessed using a multiple imputation approach. The imputation model—derived solely from the placebo group—was used to impute missing values for both the drug and placebo groups. Therefore, this so‐called placebo multiple imputation (a.k.a. controlled imputation) approach assumed patients had reduced benefit from the drug after discontinuing it. Results from the example data provided clear evidence of efficacy for the experimental drug and characterized its effectiveness. Data after discontinuation of study medication were not required for these analyses. Given the idiosyncratic nature of drug development, no estimand or approach is universally appropriate. However, the general practice of pairing efficacy and effectiveness estimands may often be useful in understanding the overall risks and benefits of a drug. Controlled imputation approaches, such as placebo multiple imputation, can be a flexible and transparent framework for formulating primary analyses of effectiveness estimands and sensitivity analyses for efficacy estimands. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

15.

The impact of missing data and how it is handled on the rate of false-positive results in drug development

Barnes SA Mallinckrodt CH Lindborg SR Carter MK 《Pharmaceutical statistics》2008,7(3):215-225

In drug development, a common choice for the primary analysis is to assess mean changes via analysis of (co)variance with missing data imputed by carrying the last or baseline observations forward (LOCF, BOCF). These approaches assume that data are missing completely at random (MCAR). Multiple imputation (MI) and likelihood-based repeated measures (MMRM) are less restrictive as they assume data are missing at random (MAR). Nevertheless, LOCF and BOCF remain popular, perhaps because it is thought that the bias in these methods lead to protection against falsely concluding that a drug is more effective than the control. We conducted a simulation study that compared the rate of false positive results or regulatory risk error (RRE) from BOCF, LOCF, MI, and MMRM in 32 scenarios that were generated from a 2(5) full factorial arrangement with data missing due to a missing not at random (MNAR) mechanism. Both BOCF and LOCF inflated RRE were compared to MI and MMRM. In 12 of the 32 scenarios, BOCF yielded inflated RRE compared with eight scenarios for LOCF, three scenarios for MI and four scenarios for MMRM. In no situation did BOCF or LOCF provide adequate control of RRE when MI and MMRM did not. Both MI and MMRM are better choices than either BOCF or LOCF for the primary analysis. 相似文献

16.

Missing data in principal component analysis of questionnaire data: a comparison of methods

《Journal of Statistical Computation and Simulation》2012,82(11):2298-2315

Principal component analysis (PCA) is a widely used statistical technique for determining subscales in questionnaire data. As in any other statistical technique, missing data may both complicate its execution and interpretation. In this study, six methods for dealing with missing data in the context of PCA are reviewed and compared: listwise deletion (LD), pairwise deletion, the missing data passive approach, regularized PCA, the expectation-maximization algorithm, and multiple imputation. Simulations show that except for LD, all methods give about equally good results for realistic percentages of missing data. Therefore, the choice of a procedure can be based on the ease of application or purely the convenience of availability of a technique. 相似文献

17.

Imputation techniques for incomplete data in quadratic discriminant analysis

《Journal of Statistical Computation and Simulation》2012,82(6):863-877

We have compared the efficacy of five imputation algorithms readily available in SAS for the quadratic discriminant function. Here, we have generated several different parametric-configuration training data with missing data, including monotone missing-at-random observations, and used a Monte Carlo simulation to examine the expected probabilities of misclassification for the two-class quadratic statistical discrimination problem under five different imputation methods. Specifically, we have compared the efficacy of the complete observation-only method and the mean substitution, regression, predictive mean matching, propensity score, and Markov Chain Monte Carlo (MCMC) imputation methods. We found that the MCMC and propensity score multiple imputation approaches are, in general, superior to the other imputation methods for the configurations and training-sample sizes we considered. 相似文献

18.

A numerical study of multiple imputation methods using nonparametric multivariate outlier identifiers and depth-based performance criteria with clinical laboratory data

《Journal of Statistical Computation and Simulation》2012,82(5):547-560

It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally. 相似文献

19.

Treatment policy estimands for recurrent event data using data collected after cessation of randomised treatment

James H. Roger Daniel J. Bratton Bhabita Mayer Juan J. Abellan Oliver N. Keene 《Pharmaceutical statistics》2019,18(1):85-95

In the past, many clinical trials have withdrawn subjects from the study when they prematurely stopped their randomised treatment and have therefore only collected ‘on‐treatment’ data. Thus, analyses addressing a treatment policy estimand have been restricted to imputing missing data under assumptions drawn from these data only. Many confirmatory trials are now continuing to collect data from subjects in a study even after they have prematurely discontinued study treatment as this event is irrelevant for the purposes of a treatment policy estimand. However, despite efforts to keep subjects in a trial, some will still choose to withdraw. Recent publications for sensitivity analyses of recurrent event data have focused on the reference‐based imputation methods commonly applied to continuous outcomes, where imputation for the missing data for one treatment arm is based on the observed outcomes in another arm. However, the existence of data from subjects who have prematurely discontinued treatment but remained in the study has now raised the opportunity to use this ‘off‐treatment’ data to impute the missing data for subjects who withdraw, potentially allowing more plausible assumptions for the missing post‐study‐withdrawal data than reference‐based approaches. In this paper, we introduce a new imputation method for recurrent event data in which the missing post‐study‐withdrawal event rate for a particular subject is assumed to reflect that observed from subjects during the off‐treatment period. The method is illustrated in a trial in chronic obstructive pulmonary disease (COPD) where the primary endpoint was the rate of exacerbations, analysed using a negative binomial model. 相似文献

20.

Missing data in clinical trials: from clinical assumptions to statistical analysis using pattern mixture models

Bohdana Ratitch Michael O'Kelly Robert Tosiello 《Pharmaceutical statistics》2013,12(6):337-347

The need to use rigorous, transparent, clearly interpretable, and scientifically justified methodology for preventing and dealing with missing data in clinical trials has been a focus of much attention from regulators, practitioners, and academicians over the past years. New guidelines and recommendations emphasize the importance of minimizing the amount of missing data and carefully selecting primary analysis methods on the basis of assumptions regarding the missingness mechanism suitable for the study at hand, as well as the need to stress‐test the results of the primary analysis under different sets of assumptions through a range of sensitivity analyses. Some methods that could be effectively used for dealing with missing data have not yet gained widespread usage, partly because of their underlying complexity and partly because of lack of relatively easy approaches to their implementation. In this paper, we explore several strategies for missing data on the basis of pattern mixture models that embody clear and realistic clinical assumptions. Pattern mixture models provide a statistically reasonable yet transparent framework for translating clinical assumptions into statistical analyses. Implementation details for some specific strategies are provided in an Appendix (available online as Supporting Information), whereas the general principles of the approach discussed in this paper can be used to implement various other analyses with different sets of assumptions regarding missing data. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献