期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Practical Advice on How to Impute Continuous Data When the Ultimate Interest Centers on Dichotomized Outcomes Through Pre-Specified Thresholds

Hakan Demirtas 《统计学通讯:模拟与计算》2013,42(4):871-889

Multiple imputation under the multivariate normality assumption has often been regarded as a viable model-based approach in dealing with incomplete continuous data in the last two decades. A situation where the measurements are taken on a continuous scale with an ultimate interest in dichotomized versions through discipline-specific thresholds is not uncommon in applied research, especially in medical and social sciences. In practice, researchers generally tend to impute missing values for continuous outcomes under a Gaussian imputation model, and then dichotomize them via commonly-accepted cut-off points. An alternative strategy is creating multiply imputed data sets after dichotomization under a log-linear imputation model that uses a saturated multinomial structure. In this work, the performances of the two imputation methods were examined on a fairly wide range of simulated incomplete data sets that exhibit varying distributional characteristics such as skewness and multimodality. Behavior of efficiency and accuracy measures was explored to determine the extent to which the procedures work properly. The conclusion drawn is that dichotomization before carrying out a log-linear imputation should be the preferred approach except for a few special cases. I recommend that researchers use the atypical second strategy whenever the interest centers on binary quantities that are obtained through underlying continuous measurements. A possible explanation is that erratic/idiosyncratic aspects that are not accommodated by a Gaussian model are probably transformed into better-behaving discrete trends in this particular missing-data setting. This premise outweighs the assertion that continuous variables inherently carry more information, leading to a counter-intuitive, but potentially useful result for practitioners. 相似文献

2.

Sensitivity to imputation models and assumptions in receiver operating characteristic analysis with incomplete data

《Journal of Statistical Computation and Simulation》2012,82(17):3498-3511

Modern statistical methods using incomplete data have been increasingly applied in a wide variety of substantive problems. Similarly, receiver operating characteristic (ROC) analysis, a method used in evaluating diagnostic tests or biomarkers in medical research, has also been increasingly popular problem in both its development and application. While missing-data methods have been applied in ROC analysis, the impact of model mis-specification and/or assumptions (e.g. missing at random) underlying the missing data has not been thoroughly studied. In this work, we study the performance of multiple imputation (MI) inference in ROC analysis. Particularly, we investigate parametric and non-parametric techniques for MI inference under common missingness mechanisms. Depending on the coherency of the imputation model with the underlying data generation mechanism, our results show that MI generally leads to well-calibrated inferences under ignorable missingness mechanisms. 相似文献

3.

A semiparametric method of multiple imputation

Stuart R. Lipsitz Lue Ping Zhao & Geert Molenberghs 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1998,60(1):127-144

In this paper, we describe how to use multiple imputation semiparametrically to obtain estimates of parameters and their standard errors when some individuals have missing data. The methods given require the investigator to know or be able to estimate the process generating the missing data but requires no full distributional form for the data. The method is especially useful for non-standard problems, such as estimating the median when data are missing. 相似文献

4.

Variance estimation when donor imputation is used to fill in missing values

Jean‐François Beaumont Cynthia Bocci 《Revue canadienne de statistique》2009,37(3):400-416

Donor imputation is frequently used in surveys. However, very few variance estimation methods that take into account donor imputation have been developed in the literature. This is particularly true for surveys with high sampling fractions using nearest donor imputation, often called nearest‐neighbour imputation. In this paper, the authors develop a variance estimator for donor imputation based on the assumption that the imputed estimator of a domain total is approximately unbiased under an imputation model; that is, a model for the variable requiring imputation. Their variance estimator is valid, irrespective of the magnitude of the sampling fractions and the complexity of the donor imputation method, provided that the imputation model mean and variance are accurately estimated. They evaluate its performance in a simulation study and show that nonparametric estimation of the model mean and variance via smoothing splines brings robustness with respect to imputation model misspecifications. They also apply their variance estimator to real survey data when nearest‐neighbour imputation has been used to fill in the missing values. The Canadian Journal of Statistics 37: 400–416; 2009 © 2009 Statistical Society of Canada 相似文献

5.

Latent class based multiple imputation approach for missing categorical data

Mulugeta Gebregziabher Stacia M. DeSantis 《Journal of statistical planning and inference》2010

In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered. 相似文献

6.

Doubly robust point and variance estimation in the presence of imputed survey data

David Haziza Frédéric Picard 《Revue canadienne de statistique》2012,40(2):259-281

Imputation is often used in surveys to treat item nonresponse. It is well known that treating the imputed values as observed values may lead to substantial underestimation of the variance of the point estimators. To overcome the problem, a number of variance estimation methods have been proposed in the literature, including resampling methods such as the jackknife and the bootstrap. In this paper, we consider the problem of doubly robust inference in the presence of imputed survey data. In the doubly robust literature, point estimation has been the main focus. In this paper, using the reverse framework for variance estimation, we derive doubly robust linearization variance estimators in the case of deterministic and random regression imputation within imputation classes. Also, we study the properties of several jackknife variance estimators under both negligible and nonnegligible sampling fractions. A limited simulation study investigates the performance of various variance estimators in terms of relative bias and relative stability. Finally, the asymptotic normality of imputed estimators is established for stratified multistage designs under both deterministic and random regression imputation. The Canadian Journal of Statistics 40: 259–281; 2012 © 2012 Statistical Society of Canada 相似文献

7.

Imputation by power transformation

Sarjinder?Singh Email author Balbinder?Deo 《Statistical Papers》2003,44(4):555-579

In this paper, a new power transformation estimator of population mean in the presence of non-response has been suggested. The estimator of mean obtained from proposed technique remains better than the estimators obtained from ratio or mean methods of imputation. The mean squared error of the resultant estimator is less than that of the estimator obtained on the basis of ratio method of imputation for the optinum choice of parameters. An estimator for estimating a parameter involved in the process of new method of imputation has been discussed. The MSE expressions for the proposed estimators have been derived analytically and compared empirically. Product method of imputation for negatively correlated variables has also been introduced. The work has been extended to the case of multi-auxiliary information to be used for imputation. 相似文献

8.

Missing item imputation for quality-of-life instruments with application to asthma quality-of-life questionnaires

Wang J Rapatz G Lowy A Olson S Kuebler J 《Pharmaceutical statistics》2009,8(1):73-83

There has been increasing use of quality-of-life (QoL) instruments in drug development. Missing item values often occur in QoL data. A common approach to solve this problem is to impute the missing values before scoring. Several imputation procedures, such as imputing with the most correlated item and imputing with a row/column model or an item response model, have been proposed. We examine these procedures using data from two clinical trials, in which the original asthma quality-of-life questionnaire (AQLQ) and the miniAQLQ were used. We propose two modifications to existing procedures: truncating the imputed values to eliminate outliers and using the proportional odds model as the item response model for imputation. We also propose a novel imputation method based on a semi-parametric beta regression so that the imputed value is always in the correct range and illustrate how this approach can easily be implemented in commonly used statistical software. To compare these approaches, we deleted 5% of item values in the data according to three different missingness mechanisms, imputed them using these approaches and compared the imputed values with the true values. Our comparison showed that the row/column-model-based imputation with truncation generally performed better, whereas our new approach had better performance under a number scenarios. 相似文献

9.

Treatment policy estimands for recurrent event data using data collected after cessation of randomised treatment

James H. Roger Daniel J. Bratton Bhabita Mayer Juan J. Abellan Oliver N. Keene 《Pharmaceutical statistics》2019,18(1):85-95

In the past, many clinical trials have withdrawn subjects from the study when they prematurely stopped their randomised treatment and have therefore only collected ‘on‐treatment’ data. Thus, analyses addressing a treatment policy estimand have been restricted to imputing missing data under assumptions drawn from these data only. Many confirmatory trials are now continuing to collect data from subjects in a study even after they have prematurely discontinued study treatment as this event is irrelevant for the purposes of a treatment policy estimand. However, despite efforts to keep subjects in a trial, some will still choose to withdraw. Recent publications for sensitivity analyses of recurrent event data have focused on the reference‐based imputation methods commonly applied to continuous outcomes, where imputation for the missing data for one treatment arm is based on the observed outcomes in another arm. However, the existence of data from subjects who have prematurely discontinued treatment but remained in the study has now raised the opportunity to use this ‘off‐treatment’ data to impute the missing data for subjects who withdraw, potentially allowing more plausible assumptions for the missing post‐study‐withdrawal data than reference‐based approaches. In this paper, we introduce a new imputation method for recurrent event data in which the missing post‐study‐withdrawal event rate for a particular subject is assumed to reflect that observed from subjects during the off‐treatment period. The method is illustrated in a trial in chronic obstructive pulmonary disease (COPD) where the primary endpoint was the rate of exacerbations, analysed using a negative binomial model. 相似文献

10.

Comparison of methods for incomplete repeated measures data analysis in small samples

《Journal of statistical planning and inference》2006,136(1):235-247

This paper presents missing data methods for repeated measures data in small samples. Most methods currently available are for large samples. In particular, no studies have compared the performance of multiple imputation methods to that of non-imputation incomplete analysis methods. We first develop a strategy for multiple imputations for repeated measures data under a cell-means model that is applicable for any multivariate data with small samples. Multiple imputation inference procedures are applied to the resulting multiply imputed complete data sets. Comparisons to other available non-imputation incomplete data methods is made via simulation studies to conclude that there is not much gain in using the computer intensive multiple imputation methods for small sample repeated measures data analysis in terms of the power of testing hypotheses of parameters of interest. 相似文献

11.

Adjusted Supremum Score‐Type Statistics for Evaluating Non‐Standard Hypotheses

下载免费PDF全文

Wei‐Wen Hsu David Todem Kyungmann Kim 《Scandinavian Journal of Statistics》2015,42(3):746-759

Supremum score test statistics are often used to evaluate hypotheses with unidentifiable nuisance parameters under the null hypothesis. Although these statistics provide an attractive framework to address non‐identifiability under the null hypothesis, little attention has been paid to their distributional properties in small to moderate sample size settings. In situations where there are identifiable nuisance parameters under the null hypothesis, these statistics may behave erratically in realistic samples as a result of a non‐negligible bias induced by substituting these nuisance parameters by their estimates under the null hypothesis. In this paper, we propose an adjustment to the supremum score statistics by subtracting the expected bias from the score processes and show that this adjustment does not alter the limiting null distribution of the supremum score statistics. Using a simple example from the class of zero‐inflated regression models for count data, we show empirically and theoretically that the adjusted tests are superior in terms of size and power. The practical utility of this methodology is illustrated using count data in HIV research. 相似文献

12.

Mass imputation for two-phase sampling

《Journal of the Korean Statistical Society》2019,48(4):578-592

Two-phase sampling is a cost-effective method of data collection using outcome-dependent sampling for the second-phase sample. In order to make efficient use of auxiliary information and to improve domain estimation, mass imputation can be used in two-phase sampling. Rao and Sitter (1995) introduce mass imputation for two-phase sampling and its variance estimation under simple random sampling in both phases. In this paper, we extend the Rao–Sitter method to general sampling design. The proposed method is further extended to mass imputation for categorical data. A limited simulation study is performed to examine the performance of the proposed methods. 相似文献

13.

Missing data sensitivity analysis for recurrent event data using controlled imputation

Oliver N. Keene James H. Roger Benjamin F. Hartley Michael G. Kenward 《Pharmaceutical statistics》2014,13(4):258-264

Statistical analyses of recurrent event data have typically been based on the missing at random assumption. One implication of this is that, if data are collected only when patients are on their randomized treatment, the resulting de jure estimator of treatment effect corresponds to the situation in which the patients adhere to this regime throughout the study. For confirmatory analysis of clinical trials, sensitivity analyses are required to investigate alternative de facto estimands that depart from this assumption. Recent publications have described the use of multiple imputation methods based on pattern mixture models for continuous outcomes, where imputation for the missing data for one treatment arm (e.g. the active arm) is based on the statistical behaviour of outcomes in another arm (e.g. the placebo arm). This has been referred to as controlled imputation or reference‐based imputation. In this paper, we use the negative multinomial distribution to apply this approach to analyses of recurrent events and other similar outcomes. The methods are illustrated by a trial in severe asthma where the primary endpoint was rate of exacerbations and the primary analysis was based on the negative binomial model. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

14.

Comparisons of imputation methods with application to assess factors associated with self efficacy of physical activity in breast cancer survivors

Yunxi Zhang Ye Lin George Baum Karen M. Basen-Engquist Michael D. Swartz 《统计学通讯:模拟与计算》2013,42(8):2523-2537

ABSTRACT

Missing data are commonly encountered in self-reported measurements and questionnaires. It is crucial to treat missing values using appropriate method to avoid bias and reduction of power. Various types of imputation methods exist, but it is not always clear which method is preferred for imputation of data with non-normal variables. In this paper, we compared four imputation methods: mean imputation, quantile imputation, multiple imputation, and quantile regression multiple imputation (QRMI), using both simulated and real data investigating factors affecting self-efficacy in breast cancer survivors. The results displayed an advantage of using multiple imputation, especially QRMI when data are not normal. 相似文献

15.

Nonparametric methods for the estimation of imputation uncertainty

Akbar Heydarbeygie Nima Ahmadi 《Journal of applied statistics》2013,40(3):693-698

It is cleared in recent researches that the raising of missing values in datasets is inevitable. Imputation of missing data is one of the several methods which have been introduced to overcome this issue. Imputation techniques are trying to answer the case of missing data by covering missing values with reasonable estimates permanently. There are a lot of benefits for these procedures rather than their drawbacks. The operation of these methods has not been clarified, which means that they provide mistrust among analytical results. One approach to evaluate the outcomes of the imputation process is estimating uncertainty in the imputed data. Nonparametric methods are appropriate to estimating the uncertainty when data are not followed by any particular distribution. This paper deals with a nonparametric method for estimation and testing the significance of the imputation uncertainty, which is based on Wilcoxon test statistic, and which could be employed for estimating the precision of the imputed values created by imputation methods. This proposed procedure could be employed to judge the possibility of the imputation process for datasets, and to evaluate the influence of proper imputation methods when they are utilized to the same dataset. This proposed approach has been compared with other nonparametric resampling methods, including bootstrap and jackknife to estimate uncertainty in the imputed data under the Bayesian bootstrap imputation method. The ideas supporting the proposed method are clarified in detail, and a simulation study, which indicates how the approach has been employed in practical situations, is illustrated. 相似文献

16.

Calibrated random imputation for qualitative data

《Journal of statistical planning and inference》2005,128(2):411-425

In official statistics, when a file of microdata must be delivered to external users, it is very difficult to propose them a file where missing values has been treated by multiple imputations. In order to overcome this difficulty, we propose a method of single imputation for qualitative data that respect numerous constraints. The imputation is balanced on totals previously estimated; editing rules can be respected; the imputation is random, but the totals are not affected by an imputation variance. 相似文献

17.

Imputation techniques for incomplete data in quadratic discriminant analysis

《Journal of Statistical Computation and Simulation》2012,82(6):863-877

We have compared the efficacy of five imputation algorithms readily available in SAS for the quadratic discriminant function. Here, we have generated several different parametric-configuration training data with missing data, including monotone missing-at-random observations, and used a Monte Carlo simulation to examine the expected probabilities of misclassification for the two-class quadratic statistical discrimination problem under five different imputation methods. Specifically, we have compared the efficacy of the complete observation-only method and the mean substitution, regression, predictive mean matching, propensity score, and Markov Chain Monte Carlo (MCMC) imputation methods. We found that the MCMC and propensity score multiple imputation approaches are, in general, superior to the other imputation methods for the configurations and training-sample sizes we considered. 相似文献

18.

On a measure of information gain for regression models in survival analysis

Delphine Maucort-Boulch Pascal Roy Janez Stare 《Journal of applied statistics》2014,41(12):2696-2708

Papers dealing with measures of predictive power in survival analysis have seen their independence of censoring, or their estimates being unbiased under censoring, as the most important property. We argue that this property has been wrongly understood. Discussing the so-called measure of information gain, we point out that we cannot have unbiased estimates if all values, greater than a given time τ, are censored. This is due to the fact that censoring before τ has a different effect than censoring after τ. Such τ is often introduced by design of a study. Independence can only be achieved under the assumption of the model being valid after τ, which is impossible to verify. But if one is willing to make such an assumption, we suggest using multiple imputation to obtain a consistent estimate. We further show that censoring has different effects on the estimation of the measure for the Cox model than for parametric models, and we discuss them separately. We also give some warnings about the usage of the measure, especially when it comes to comparing essentially different models. 相似文献

19.

Local regression when the responses are Interval-Censored

《Journal of Statistical Computation and Simulation》2012,82(10):1247-1279

Conditional expectation imputation and local-likelihood methods are contrasted with a midpoint imputation method for bivariate regression involving interval-censored responses. Although the methods can be extended in principle to higher order polynomials, our focus is on the local constant case. Comparisons are based on simulations of data scattered about three target functions with normally distributed errors. Two censoring mechanisms are considered: the first is analogous to current-status data in which monitoring times occur according to a homogeneous Poisson process; the second is analogous to a coarsening mechanism such as would arise when the response values are binned. We find that, according to a pointwise MSE criterion, no method dominates any other when interval sizes are fixed, but when the intervals have a variable width, the local-likelihood method often performs better than the other methods, and midpoint imputation performs the worst. Several illustrative examples are presented. 相似文献

20.

A distance-based rounding strategy for post-imputation ordinal data

Hakan Demirtas 《Journal of applied statistics》2010,37(3):489-500

Multiple imputation has emerged as a widely used model-based approach in dealing with incomplete data in many application areas. Gaussian and log-linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings which include a mix of continuous and discrete variables, correct specification of the imputation model could be a daunting task owing to the lack of flexible models for the joint distribution of variables of different nature. This complication, along with accessibility to software packages that are capable of carrying out multiple imputation under the assumption of joint multivariate normality, appears to encourage applied researchers for pragmatically treating the discrete variables as continuous for imputation purposes, and subsequently rounding the imputed values to the nearest observed category. In this article, I introduce a distance-based rounding approach for ordinal variables in the presence of continuous ones. The first step of the proposed rounding process is predicated upon creating indicator variables that correspond to the ordinal levels, followed by jointly imputing all variables under the assumption of multivariate normality. The imputed values are then converted to the ordinal scale based on their Euclidean distances to a set of indicators, with minimal distance corresponding to the closest match. I compare the performance of this technique to crude rounding via commonly accepted accuracy and precision measures with simulated data sets. 相似文献