期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A comparison of various software tools for dealing with missing data via imputation

《Journal of Statistical Computation and Simulation》2012,82(11):1653-1675

In real-life situations, we often encounter data sets containing missing observations. Statistical methods that address missingness have been extensively studied in recent years. One of the more popular approaches involves imputation of the missing values prior to the analysis, thereby rendering the data complete. Imputation broadly encompasses an entire scope of techniques that have been developed to make inferences about incomplete data, ranging from very simple strategies (e.g. mean imputation) to more advanced approaches that require estimation, for instance, of posterior distributions using Markov chain Monte Carlo methods. Additional complexity arises when the number of missingness patterns increases and/or when both categorical and continuous random variables are involved. Implementation of routines, procedures, or packages capable of generating imputations for incomplete data are now widely available. We review some of these in the context of a motivating example, as well as in a simulation study, under two missingness mechanisms (missing at random and missing not at random). Thus far, evaluation of existing implementations have frequently centred on the resulting parameter estimates of the prescribed model of interest after imputing the missing data. In some situations, however, interest may very well be on the quality of the imputed values at the level of the individual – an issue that has received relatively little attention. In this paper, we focus on the latter to provide further insight about the performance of the different routines, procedures, and packages in this respect. 相似文献

2.

Coping with missing data in phase III pivotal registration trials: Tolvaptan in subjects with kidney disease,a case study

下载免费PDF全文

John Ouyang Kevin J. Carroll Gary Koch Junfang Li 《Pharmaceutical statistics》2017,16(4):250-266

Missing data cause challenging issues, particularly in phase III registration trials, as highlighted by the European Medicines Agency (EMA) and the US National Research Council. We explore, as a case study, how the issues from missing data were tackled in a double‐blind phase III trial in subjects with autosomal dominant polycystic kidney disease. A total of 1445 subjects were randomized in a 2:1 ratio to receive active treatment (tolvaptan), or placebo. The primary outcome, the rate of change in total kidney volume, favored tolvaptan (P < .0001). The key secondary efficacy endpoints of clinical progression of disease and rate of decline in kidney function also favored tolvaptan. However, as highlighted by Food and Drug Administration and EMA, the interpretation of results was hampered by a high number of unevenly distributed dropouts, particularly early dropouts. In this paper, we outline the analyses undertaken to address the issue of missing data thoroughly. “Tipping point analyses” were performed to explore how extreme and detrimental outcomes among subjects with missing data must be to overturn the positive treatment effect attained in those subjects who had complete data. Nonparametric rank‐based analyses were also performed accounting for missing data. In conclusion, straightforward and transparent analyses directly taking into account missing data convincingly support the robustness of the preplanned analyses on the primary and secondary endpoints. Tolvaptan was confirmed to be effective in slowing total kidney volume growth, which is considered an efficacy endpoint by EMA, and in lessening the decline in renal function in patients with autosomal dominant polycystic kidney disease. 相似文献

3.

Comparison of algorithms for replacing missing data in discriminant analysis

J.Twedt Daniel D.S. Gill 《统计学通讯:理论与方法》2013,42(6):1567-1578

We examined the impact of different methods for replacing missing data in discriminant analyses conducted on randomly generated samples from multivariate normal and non-normal distributions. The probabilities of correct classification were obtained for these discriminant analyses before and after randomly deleting data as well as after deleted data were replaced using: (1) variable means, (2) principal component projections, and (3) the EM algorithm. Populations compared were: (1) multivariate normal with covariance matrices ∑₁=∑₂, (2) multivariate normal with ∑₁≠∑₂ and (3) multivariate non-normal with ∑₁=∑₂. Differences in the probabilities of correct classification were most evident for populations with small Mahalanobis distances or high proportions of missing data. The three replacement methods performed similarly but all were better than non - replacement. 相似文献

4.

A structured framework for assessing sensitivity to missing data assumptions in longitudinal clinical trials

C.H. Mallinckrodt Q. Lin M. Molenberghs 《Pharmaceutical statistics》2013,12(1):1-6

The objective of this research was to demonstrate a framework for drawing inference from sensitivity analyses of incomplete longitudinal clinical trial data via a re‐analysis of data from a confirmatory clinical trial in depression. A likelihood‐based approach that assumed missing at random (MAR) was the primary analysis. Robustness to departure from MAR was assessed by comparing the primary result to those from a series of analyses that employed varying missing not at random (MNAR) assumptions (selection models, pattern mixture models and shared parameter models) and to MAR methods that used inclusive models. The key sensitivity analysis used multiple imputation assuming that after dropout the trajectory of drug‐treated patients was that of placebo treated patients with a similar outcome history (placebo multiple imputation). This result was used as the worst reasonable case to define the lower limit of plausible values for the treatment contrast. The endpoint contrast from the primary analysis was ? 2.79 (p = .013). In placebo multiple imputation, the result was ? 2.17. Results from the other sensitivity analyses ranged from ? 2.21 to ? 3.87 and were symmetrically distributed around the primary result. Hence, no clear evidence of bias from missing not at random data was found. In the worst reasonable case scenario, the treatment effect was 80% of the magnitude of the primary result. Therefore, it was concluded that a treatment effect existed. The structured sensitivity framework of using a worst reasonable case result based on a controlled imputation approach with transparent and debatable assumptions supplemented a series of plausible alternative models under varying assumptions was useful in this specific situation and holds promise as a generally useful framework. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

5.

Ian R. White 《Research Synthesis Methods》2023,14(1):137-139

The paper by Lee and Beretvas (doi: 10.1002/jrsm.1585 ) described a well-executed simulation study comparing ‘modern’ with ‘ad hoc’ methods for performing meta-regression when some covariates are incomplete. However, they drew practical conclusions after simulating data under a single missing data mechanism which favoured the ‘modern’ methods, while other missing data mechanisms would have favoured the ‘ad hoc’ methods. Broad recommendations about methods to use in practice should instead be based on simulation studies using a range of plausible data-generating mechanisms. This range must represent what is believed likely to occur in practice, and not what is convenient for statistical analysis. 相似文献

6.

An application of the mixed-effects model and pattern mixture model to treatment groups with differential missingness suspected not-missing-at-random

Masahiko Gosho Kazushi Maruo 《Pharmaceutical statistics》2021,20(1):93-108

Likelihood-based, mixed-effects models for repeated measures (MMRMs) are occasionally used in primary analyses for group comparisons of incomplete continuous longitudinal data. Although MMRM analysis is generally valid under missing-at-random assumptions, it is invalid under not-missing-at-random (NMAR) assumptions. We consider the possibility of bias of estimated treatment effect using standard MMRM analysis in a motivational case, and propose simple and easily implementable pattern mixture models within the framework of mixed-effects modeling, to handle the NMAR data with differential missingness between treatment groups. The proposed models are a new form of pattern mixture model that employ a categorical time variable when modeling the outcome and a continuous time variable when modeling the missingness-data patterns. The models can directly provide an overall estimate of the treatment effect of interest using the average of the distribution of the missingness indicator and a categorical time variable in the same manner as MMRM analysis. Our simulation results indicate that the bias of the treatment effect for MMRM analysis was considerably larger than that for the pattern mixture model analysis under NMAR assumptions. In the case study, it would be dangerous to interpret only the results of the MMRM analysis, and the proposed pattern mixture model would be useful as a sensitivity analysis for treatment effect evaluation. 相似文献

7.

An extension of the placebo‐based pattern‐mixture model

Kaifeng Lu 《Pharmaceutical statistics》2014,13(2):103-109

Pattern‐mixture models provide a general and flexible framework for sensitivity analyses of nonignorable missing data in longitudinal studies. The placebo‐based pattern‐mixture model handles missing data in a transparent and clinically interpretable manner. We extend this model to include a sensitivity parameter that characterizes the gradual departure of the missing data mechanism from being missing at random toward being missing not at random under the standard placebo‐based pattern‐mixture model. We derive the treatment effect implied by the extended model. We propose to utilize the primary analysis based on a mixed‐effects model for repeated measures to draw inference about the treatment effect under the extended placebo‐based pattern‐mixture model. We use simulation studies to confirm the validity of the proposed method. We apply the proposed method to a clinical study of major depressive disorders. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

8.

Efficient inverse probability weighting method for quantile regression with nonignorable missing data

Pu-Ying Zhao De-Peng Jiang 《Statistics》2017,51(2):363-386

Quantitle regression (QR) is a popular approach to estimate functional relations between variables for all portions of a probability distribution. Parameter estimation in QR with missing data is one of the most challenging issues in statistics. Regression quantiles can be substantially biased when observations are subject to missingness. We study several inverse probability weighting (IPW) estimators for parameters in QR when covariates or responses are subject to missing not at random. Maximum likelihood and semiparametric likelihood methods are employed to estimate the respondent probability function. To achieve nice efficiency properties, we develop an empirical likelihood (EL) approach to QR with the auxiliary information from the calibration constraints. The proposed methods are less sensitive to misspecified missing mechanisms. Asymptotic properties of the proposed IPW estimators are shown under general settings. The efficiency gain of EL-based IPW estimator is quantified theoretically. Simulation studies and a data set on the work limitation of injured workers from Canada are used to illustrated our proposed methodologies. 相似文献

9.

Abdul-Karim Iddrisu Freedom Gumedze 《Journal of applied statistics》2019,46(4):754-769

In this paper, we investigate the effect of tuberculosis pericarditis (TBP) treatment on CD4 count changes over time and draw inferences in the presence of missing data. We accounted for missing data and conducted sensitivity analyses to assess whether inferences under missing at random (MAR) assumption are sensitive to not missing at random (NMAR) assumptions using the selection model (SeM) framework. We conducted sensitivity analysis using the local influence approach and stress-testing analysis. Our analyses showed that the inferences from the MAR are robust to the NMAR assumption and influential subjects do not overturn the study conclusions about treatment effects and the dropout mechanism. Therefore, the missing CD4 count measurements are likely to be MAR. The results also revealed that TBP treatment does not interact with HIV/AIDS treatment and that TBP treatment has no significant effect on CD4 count changes over time. Although the methods considered were applied to data in the IMPI trial setting, the methods can also be applied to clinical trials with similar settings. 相似文献

10.

Current status data with two competing risks and missing failure types: a parametric approach

Tamalika Koley Anup Dewanji 《Journal of applied statistics》2022,49(7):1769

相似文献

11.

The impact of missing data on the results of a schizophrenia study

下载免费PDF全文

Denis Rybin Gheorghe Doros Robert Rosenheck Robert Lew 《Pharmaceutical statistics》2015,14(1):4-10

Missing data pose a serious challenge to the integrity of randomized clinical trials, especially of treatments for prolonged illnesses such as schizophrenia, in which long‐term impact assessment is of great importance, but the follow‐up rates are often no more than 50%. Sensitivity analysis using Bayesian modeling for missing data offers a systematic approach to assessing the sensitivity of the inferences made on the basis of observed data. This paper uses data from an 18‐month study of veterans with schizophrenia to demonstrate this approach. Data were obtained from a randomized clinical trial involving 369 patients diagnosed with schizophrenia that compared long‐acting injectable risperidone with a psychiatrist's choice of oral treatment. Bayesian analysis utilizing a pattern‐mixture modeling approach was used to validate the reported results by detecting bias due to non‐random patterns of missing data. The analysis was applied to several outcomes including standard measures of schizophrenia symptoms, quality of life, alcohol use, and global mental status. The original study results for several measures were confirmed against a wide range of patterns of non‐random missingness. Robustness of the conclusions was assessed using sensitivity parameters. The missing data in the trial did not likely threaten the validity of previously reported results. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

12.

Comparing diagnostic tests with missing data

Frederico Z. Poleto Julio M. Singer Carlos Daniel Paulino 《Journal of applied statistics》2011,38(6):1207-1222

When missing data occur in studies designed to compare the accuracy of diagnostic tests, a common, though naive, practice is to base the comparison of sensitivity, specificity, as well as of positive and negative predictive values on some subset of the data that fits into methods implemented in standard statistical packages. Such methods are usually valid only under the strong missing completely at random (MCAR) assumption and may generate biased and less precise estimates. We review some models that use the dependence structure of the completely observed cases to incorporate the information of the partially categorized observations into the analysis and show how they may be fitted via a two-stage hybrid process involving maximum likelihood in the first stage and weighted least squares in the second. We indicate how computational subroutines written in R may be used to fit the proposed models and illustrate the different analysis strategies with observational data collected to compare the accuracy of three distinct non-invasive diagnostic methods for endometriosis. The results indicate that even when the MCAR assumption is plausible, the naive partial analyses should be avoided. 相似文献

13.

Performances of Bayesian model selection criteria for generalized linear models with non-ignorably missing covariates

《Journal of Statistical Computation and Simulation》2012,82(8):1670-1691

This article deals with model comparison as an essential part of generalized linear modelling in the presence of covariates missing not at random (MNAR). We provide an evaluation of the performances of some of the popular model selection criteria, particularly of deviance information criterion (DIC) and weighted L (WL) measure, for comparison among a set of candidate MNAR models. In addition, we seek to provide deviance and quadratic loss-based model selection criteria with alternative penalty terms targeting directly the MNAR models. This work is motivated by the need in the literature to understand the performances of these important model selection criteria for comparison among a set of MNAR models. A Monte Carlo simulation experiment is designed to assess the finite sample performances of these model selection criteria in the context of interest under different scenarios for missingness amounts. Some naturally driven DIC and WL extensions are also discussed and evaluated. 相似文献

14.

High breakdown point robust estimators with missing data

Florencia Statti Victor J. Yohai 《统计学通讯:理论与方法》2018,47(21):5145-5162

In this paper, we propose a new procedure to estimate the distribution of a variable y when there are missing data. To compensate the presence of missing responses, it is assumed that a covariate vector x is observed and that y and x are related by means of a semi-parametric regression model. Observed residuals are combined with predicted values to estimate the missing response distribution. Once the responses distribution is consistently estimated, we can estimate any parameter defined through a continuous functional T using a plug in procedure. We prove that the proposed estimators have high breakdown point. 相似文献

15.

A calibrated imputation method for secondary data analysis of survey data

Damio N. Da Silva Li‐Chun Zhang 《Scandinavian Journal of Statistics》2021,48(1):25-41

In practical survey sampling, missing data are unavoidable due to nonresponse, rejected observations by editing, disclosure control, or outlier suppression. We propose a calibrated imputation approach so that valid point and variance estimates of the population (or domain) totals can be computed by the secondary users using simple complete‐sample formulae. This is especially helpful for variance estimation, which generally require additional information and tools that are unavailable to the secondary users. Our approach is natural for continuous variables, where the estimation may be either based on reweighting or imputation, including possibly their outlier‐robust extensions. We also propose a multivariate procedure to accommodate the estimation of the covariance matrix between estimated population totals, which facilitates variance estimation of the ratios or differences among the estimated totals. We illustrate the proposed approach using simulation data in supplementary materials that are available online. 相似文献

16.

Using the EM-algorithm for survival data with incomplete categorical covariates

Stuart R. Lipsitz Joseph G. Ibrahim 《Lifetime data analysis》1996,2(1):5-14

Incomplete covariate data is a common occurrence in many studies in which the outcome is survival time. With generalized linear models, when the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM by the method of weights proposed in Ibrahim (1990). In this article, we extend the EM by the method of weights to survival outcomes whose distributions may not fall in the class of generalized linear models. This method requires the estimation of the parameters of the distribution of the covariates. We present a clinical trials example with five covariates, four of which have some missing values. 相似文献

17.

Handling of missing data in long‐term clinical trials: a case study

Mark Janssens Geert Molenberghs René Kerstens 《Pharmaceutical statistics》2012,11(6):442-448

Missing data in clinical trials is a well‐known problem, and the classical statistical methods used can be overly simple. This case study shows how well‐established missing data theory can be applied to efficacy data collected in a long‐term open‐label trial with a discontinuation rate of almost 50%. Satisfaction with treatment in chronically constipated patients was the efficacy measure assessed at baseline and every 3 months postbaseline. The improvement in treatment satisfaction from baseline was originally analyzed with a paired t‐test ignoring missing data and discarding the correlation structure of the longitudinal data. As the original analysis started from missing completely at random assumptions regarding the missing data process, the satisfaction data were re‐examined, and several missing at random (MAR) and missing not at random (MNAR) techniques resulted in adjusted estimate for the improvement in satisfaction over 12 months. Throughout the different sensitivity analyses, the effect sizes remained significant and clinically relevant. Thus, even for an open‐label trial design, sensitivity analysis, with different assumptions for the nature of dropouts (MAR or MNAR) and with different classes of models (selection, pattern‐mixture, or multiple imputation models), has been found useful and provides evidence towards the robustness of the original analyses; additional sensitivity analyses could be undertaken to further qualify robustness. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

18.

Semiparametric inference for estimating equations with nonignorably missing covariates

Ji Chen Fang Fang 《Journal of nonparametric statistics》2018,30(3):796-812

We consider statistical inference of unknown parameters in estimating equations (EEs) when some covariates have nonignorably missing values, which is quite common in practice but has rarely been discussed in the literature. When an instrument, a fully observed covariate vector that helps identifying parameters under nonignorable missingness, is available, the conditional distribution of the missing covariates given other covariates can be estimated by the pseudolikelihood method of Zhao and Shao [(2015), ‘Semiparametric pseudo likelihoods in generalised linear models with nonignorable missing data’, Journal of the American Statistical Association, 110, 1577–1590)] and be used to construct unbiased EEs. These modified EEs then constitute a basis for valid inference by empirical likelihood. Our method is applicable to a wide range of EEs used in practice. It is semiparametric since no parametric model for the propensity of missing covariate data is assumed. Asymptotic properties of the proposed estimator and the empirical likelihood ratio test statistic are derived. Some simulation results and a real data analysis are presented for illustration. 相似文献

19.

Using multiple imputation to estimate cumulative distribution functions in longitudinal data analysis with data missing at random

Phillip Dinh 《Pharmaceutical statistics》2013,12(5):260-267

In longitudinal clinical studies, after randomization at baseline, subjects are followed for a period of time for development of symptoms. The interested inference could be the mean change from baseline to a particular visit in some lab values, the proportion of responders to some threshold category at a particular visit post baseline, or the time to some important event. However, in some applications, the interest may be in estimating the cumulative distribution function (CDF) at a fixed time point post baseline. When the data are fully observed, the CDF can be estimated by the empirical CDF. When patients discontinue prematurely during the course of the study, the empirical CDF cannot be directly used. In this paper, we use multiple imputation as a way to estimate the CDF in longitudinal studies when data are missing at random. The validity of the method is assessed on the basis of the bias and the Kolmogorov–Smirnov distance. The results suggest that multiple imputation yields less bias and less variability than the often used last observation carried forward method. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

20.

An index of local sensitivity to non-ignorability for parametric survival models with potential non-random missing covariate: an application to the SEER cancer registry data

S. Eftekhari Mahabadi 《Journal of applied statistics》2012,39(11):2327-2348

Several survival regression models have been developed to assess the effects of covariates on failure times. In various settings, including surveys, clinical trials and epidemiological studies, missing data may often occur due to incomplete covariate data. Most existing methods for lifetime data are based on the assumption of missing at random (MAR) covariates. However, in many substantive applications, it is important to assess the sensitivity of key model inferences to the MAR assumption. The index of sensitivity to non-ignorability (ISNI) is a local sensitivity tool to measure the potential sensitivity of key model parameters to small departures from the ignorability assumption, needless of estimating a complicated non-ignorable model. We extend this sensitivity index to evaluate the impact of a covariate that is potentially missing, not at random in survival analysis, using parametric survival models. The approach will be applied to investigate the impact of missing tumor grade on post-surgical mortality outcomes in individuals with pancreas-head cancer in the Surveillance, Epidemiology, and End Results data set. For patients suffering from cancer, tumor grade is an important risk factor. Many individuals in these data with pancreas-head cancer have missing tumor grade information. Our ISNI analysis shows that the magnitude of effect for most covariates (with significant effect on the survival time distribution), specifically surgery and tumor grade as some important risk factors in cancer studies, highly depends on the missing mechanism assumption of the tumor grade. Also a simulation study is conducted to evaluate the performance of the proposed index in detecting sensitivity of key model parameters. 相似文献