期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Comparison of alternative imputation methods for ordinal data

Federica Cugnata 《统计学通讯:模拟与计算》2017,46(1):315-330

In this article, we compare alternative missing imputation methods in the presence of ordinal data, in the framework of CUB (Combination of Uniform and (shifted) Binomial random variable) models. Various imputation methods are considered, as are univariate and multivariate approaches. The first step consists of running a simulation study designed by varying the parameters of the CUB model, to consider and compare CUB models as well as other methods of missing imputation. We use real datasets on which to base the comparison between our approach and some general methods of missing imputation for various missing data mechanisms. 相似文献

2.

Tukey's gh Distribution for Multiple Imputation

《The American statistician》2013,67(3):251-256

Tukey proposed a class of distributions, the g-and-h family (gh family), based on a transformation of a standard normal variable to accommodate different skewness and elongation in the distribution of variables arising in practical applications. It is easy to draw values from this distribution even though it is hard to explicitly state the probability density function. Given this flexibility, the gh family may be extremely useful in creating multiple imputations for missing data. This article demonstrates how this family, as well as its generalizations, can be used in the multiple imputation analysis of incomplete data. The focus of this article is on a scalar variable with missing values. In the absence of any additional information, data are missing completely at random, and hence the correct analysis is the complete-case analysis. Thus, the application of the gh multiple imputation to the scalar cases affords comparison with the correct analysis and with other model-based multiple imputation methods. Comparisons are made using simulated datasets and the data from a survey of adolescents ascertaining driving after drinking alcohol. 相似文献

3.

Nonparametric curve estimation with missing data: A general empirical process approach

Majid Mojirsheibani 《Journal of statistical planning and inference》2007

A general nonparametric imputation procedure, based on kernel regression, is proposed to estimate points as well as set- and function-indexed parameters when the data are missing at random (MAR). The proposed method works by imputing a specific function of a missing value (and not the missing value itself), where the form of this specific function is dictated by the parameter of interest. Both single and multiple imputations are considered. The associated empirical processes provide the right tool to study the uniform convergence properties of the resulting estimators. Our estimators include, as special cases, the imputation estimator of the mean, the estimator of the distribution function proposed by Cheng and Chu [1996. Kernel estimation of distribution functions and quantiles with missing data. Statist. Sinica 6, 63–78], imputation estimators of a marginal density, and imputation estimators of regression functions. 相似文献

4.

Information attainable in some randomly incomplete multivariate response models

Tejas A. Desai Pranab K. Sen 《Journal of statistical planning and inference》2008

In a general parametric setup, a multivariate regression model is considered when responses may be missing at random while the explanatory variables and covariates are completely observed. Asymptotic optimality properties of maximum likelihood estimators for such models are linked to the Fisher information matrix for the parameters. It is shown that the information matrix is well defined for the missing-at-random model and that it plays the same role as in the complete-data linear models. Applications of the methodologic developments in hypothesis-testing problems, without any imputation of missing data, are illustrated. Some simulation results comparing the proposed method with Rubin's multiple imputation method are presented. 相似文献

5.

Stochastic EM algorithm of a finite mixture model from hurdle Poisson distribution with missing responses

Ying-zi Fu 《统计学通讯:理论与方法》2013,42(20):5918-5932

ABSTRACT

In this article, a finite mixture model of hurdle Poisson distribution with missing outcomes is proposed, and a stochastic EM algorithm is developed for obtaining the maximum likelihood estimates of model parameters and mixing proportions. Specifically, missing data is assumed to be missing not at random (MNAR)/non ignorable missing (NINR) and the corresponding missingness mechanism is modeled through probit regression. To improve the algorithm efficiency, a stochastic step is incorporated into the E-step based on data augmentation, whereas the M-step is solved by the method of conditional maximization. A variation on Bayesian information criterion (BIC) is also proposed to compare models with different number of components with missing values. The considered model is a general model framework and it captures the important characteristics of count data analysis such as zero inflation/deflation, heterogeneity as well as missingness, providing us with more insight into the data feature and allowing for dispersion to be investigated more fully and correctly. Since the stochastic step only involves simulating samples from some standard distributions, the computational burden is alleviated. Once missing responses and latent variables are imputed to replace the conditional expectation, our approach works as part of a multiple imputation procedure. A simulation study and a real example illustrate the usefulness and effectiveness of our methodology. 相似文献

6.

Multiple imputation for ordinal longitudinal data with monotone missing data patterns

A.Y. Kombo H. Mwambi G. Molenberghs 《Journal of applied statistics》2017,44(2):270-287

Missing data often complicate the analysis of scientific data. Multiple imputation is a general purpose technique for analysis of datasets with missing values. The approach is applicable to a variety of missing data patterns but often complicated by some restrictions like the type of variables to be imputed and the mechanism underlying the missing data. In this paper, the authors compare the performance of two multiple imputation methods, namely fully conditional specification and multivariate normal imputation in the presence of ordinal outcomes with monotone missing data patterns. Through a simulation study and an empirical example, the authors show that the two methods are indeed comparable meaning any of the two may be used when faced with scenarios, at least, as the ones presented here. 相似文献

7.

A multiple imputation method for incomplete correlated ordinal data using multivariate probit models

Xiao Zhang Quanlin Li Karen Cropsey Xiaowei Yang Kui Zhang Thomas Belin 《统计学通讯:模拟与计算》2017,46(3):2360-2375

The multiple imputation technique has proven to be a useful tool in missing data analysis. We propose a Markov chain Monte Carlo method to conduct multiple imputation for incomplete correlated ordinal data using the multivariate probit model. We conduct a thorough simulation study to compare the performance of our proposed method with two available imputation methods – multivariate normal-based and chain equation methods for various missing data scenarios. For illustration, we present an application using the data from the smoking cessation treatment study for low-income community corrections smokers. 相似文献

8.

A multiple imputation approach to nonlinear mixed-effects models with covariate measurement errors and missing values

Wei Liu Shuyou Li 《Journal of applied statistics》2015,42(3):463-476

In longitudinal studies, nonlinear mixed-effects models have been widely applied to describe the intra- and the inter-subject variations in data. The inter-subject variation usually receives great attention and it may be partially explained by time-dependent covariates. However, some covariates may be measured with substantial errors and may contain missing values. We proposed a multiple imputation method, implemented by a Markov Chain Monte-Carlo method along with Gibbs sampler, to address the covariate measurement errors and missing data in nonlinear mixed-effects models. The multiple imputation method is illustrated in a real data example. Simulation studies show that the multiple imputation method outperforms the commonly used naive methods. 相似文献

9.

Latent class based multiple imputation approach for missing categorical data

Mulugeta Gebregziabher Stacia M. DeSantis 《Journal of statistical planning and inference》2010

In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered. 相似文献

10.

Skew-normal factor analysis models with incomplete data

M. Liu 《Journal of applied statistics》2015,42(4):789-805

Traditional factor analysis (FA) rests on the assumption of multivariate normality. However, in some practical situations, the data do not meet this assumption; thus, the statistical inference made from such data may be misleading. This paper aims at providing some new tools for the skew-normal (SN) FA model when missing values occur in the data. In such a model, the latent factors are assumed to follow a restricted version of multivariate SN distribution with additional shape parameters for accommodating skewness. We develop an analytically feasible expectation conditional maximization algorithm for carrying out parameter estimation and imputation of missing values under missing at random mechanisms. The practical utility of the proposed methodology is illustrated with two real data examples and the results are compared with those obtained from the traditional FA counterparts. 相似文献

11.

Missing data methods for arbitrary missingness with small samples

Daniel McNeish 《Journal of applied statistics》2017,44(1):24-39

Missing data are a prevalent and widespread data analytic issue and previous studies have performed simulations to compare the performance of missing data methods in various contexts and for various models; however, one such context that has yet to receive much attention in the literature is the handling of missing data with small samples, particularly when the missingness is arbitrary. Prior studies have either compared methods for small samples with monotone missingness commonly found in longitudinal studies or have investigated the performance of a single method to handle arbitrary missingness with small samples but studies have yet to compare the relative performance of commonly implemented missing data methods for small samples with arbitrary missingness. This study conducts a simulation study to compare and assess the small sample performance of maximum likelihood, listwise deletion, joint multiple imputation, and fully conditional specification multiple imputation for a single-level regression model with a continuous outcome. Results showed that, provided assumptions are met, joint multiple imputation unanimously performed best of the methods examined in the conditions under study. 相似文献

12.

The impact of dichotomization in longitudinal data analysis: a simulation study

Bongin Yoo 《Pharmaceutical statistics》2010,9(4):298-312

In this paper, a simulation study is conducted to systematically investigate the impact of dichotomizing longitudinal continuous outcome variables under various types of missing data mechanisms. Generalized linear models (GLM) with standard generalized estimating equations (GEE) are widely used for longitudinal outcome analysis, but these semi‐parametric approaches are only valid under missing data completely at random (MCAR). Alternatively, weighted GEE (WGEE) and multiple imputation GEE (MI‐GEE) were developed to ensure validity under missing at random (MAR). Using a simulation study, the performance of standard GEE, WGEE and MI‐GEE on incomplete longitudinal dichotomized outcome analysis is evaluated. For comparisons, likelihood‐based linear mixed effects models (LMM) are used for incomplete longitudinal original continuous outcome analysis. Focusing on dichotomized outcome analysis, MI‐GEE with original continuous missing data imputation procedure provides well controlled test sizes and more stable power estimates compared with any other GEE‐based approaches. It is also shown that dichotomizing longitudinal continuous outcome will result in substantial loss of power compared with LMM. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

13.

MIMCA: multiple imputation for categorical variables with multiple correspondence analysis

Vincent Audigier François Husson Julie Josse 《Statistics and Computing》2017,27(2):501-518

We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal component method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (multiple imputation using the loglinear model, multiple imputation by logistic regressions) as well to the latest works on the topic (multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimate of the variability of the estimators. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other multiple imputation methods. 相似文献

14.

A new multivariate imputation method based on Bayesian networks

P. Niloofar M. Ganjali 《Journal of applied statistics》2014,41(3):501-518

Dealing with incomplete data is a pervasive problem in statistical surveys. Bayesian networks have been recently used in missing data imputation. In this research, we propose a new methodology for the multivariate imputation of missing data using discrete Bayesian networks and conditional Gaussian Bayesian networks. Results from imputing missing values in coronary artery disease data set and milk composition data set as well as a simulation study from cancer-neapolitan network are presented to demonstrate and compare the performance of three Bayesian network-based imputation methods with those of multivariate imputation by chained equations (MICE) and the classical hot-deck imputation method. To assess the effect of the structure learning algorithm on the performance of the Bayesian network-based methods, two methods called Peter-Clark algorithm and greedy search-and-score have been applied. Bayesian network-based methods are: first, the method introduced by Di Zio et al. [Bayesian networks for imputation, J. R. Stat. Soc. Ser. A 167 (2004), 309–322] in which, each missing item of a variable is imputed using the information given in the parents of that variable; second, the method of Di Zio et al. [Multivariate techniques for imputation based on Bayesian networks, Neural Netw. World 15 (2005), 303–310] which uses the information in the Markov blanket set of the variable to be imputed and finally, our new proposed method which applies the whole available knowledge of all variables of interest, consisting the Markov blanket and so the parent set, to impute a missing item. Results indicate the high quality of our new proposed method especially in the presence of high missingness percentages and more connected networks. Also the new method have shown to be more efficient than the MICE method for small sample sizes with high missing rates. 相似文献

15.

A distance-based rounding strategy for post-imputation ordinal data

Hakan Demirtas 《Journal of applied statistics》2010,37(3):489-500

Multiple imputation has emerged as a widely used model-based approach in dealing with incomplete data in many application areas. Gaussian and log-linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings which include a mix of continuous and discrete variables, correct specification of the imputation model could be a daunting task owing to the lack of flexible models for the joint distribution of variables of different nature. This complication, along with accessibility to software packages that are capable of carrying out multiple imputation under the assumption of joint multivariate normality, appears to encourage applied researchers for pragmatically treating the discrete variables as continuous for imputation purposes, and subsequently rounding the imputed values to the nearest observed category. In this article, I introduce a distance-based rounding approach for ordinal variables in the presence of continuous ones. The first step of the proposed rounding process is predicated upon creating indicator variables that correspond to the ordinal levels, followed by jointly imputing all variables under the assumption of multivariate normality. The imputed values are then converted to the ordinal scale based on their Euclidean distances to a set of indicators, with minimal distance corresponding to the closest match. I compare the performance of this technique to crude rounding via commonly accepted accuracy and precision measures with simulated data sets. 相似文献

16.

A structured framework for assessing sensitivity to missing data assumptions in longitudinal clinical trials

C.H. Mallinckrodt Q. Lin M. Molenberghs 《Pharmaceutical statistics》2013,12(1):1-6

The objective of this research was to demonstrate a framework for drawing inference from sensitivity analyses of incomplete longitudinal clinical trial data via a re‐analysis of data from a confirmatory clinical trial in depression. A likelihood‐based approach that assumed missing at random (MAR) was the primary analysis. Robustness to departure from MAR was assessed by comparing the primary result to those from a series of analyses that employed varying missing not at random (MNAR) assumptions (selection models, pattern mixture models and shared parameter models) and to MAR methods that used inclusive models. The key sensitivity analysis used multiple imputation assuming that after dropout the trajectory of drug‐treated patients was that of placebo treated patients with a similar outcome history (placebo multiple imputation). This result was used as the worst reasonable case to define the lower limit of plausible values for the treatment contrast. The endpoint contrast from the primary analysis was ? 2.79 (p = .013). In placebo multiple imputation, the result was ? 2.17. Results from the other sensitivity analyses ranged from ? 2.21 to ? 3.87 and were symmetrically distributed around the primary result. Hence, no clear evidence of bias from missing not at random data was found. In the worst reasonable case scenario, the treatment effect was 80% of the magnitude of the primary result. Therefore, it was concluded that a treatment effect existed. The structured sensitivity framework of using a worst reasonable case result based on a controlled imputation approach with transparent and debatable assumptions supplemented a series of plausible alternative models under varying assumptions was useful in this specific situation and holds promise as a generally useful framework. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

17.

基于随机森林模型的分类数据缺失值插补

孟杰 ;李春林《统计与信息论坛》2014,(9):86-90

缺失数据是影响调查问卷数据质量的重要因素,对调查问卷中的缺失值进行插补可以显著提高调查数据的质量。调查问卷的数据类型多以分类型数据为主,数据挖掘技术中的分类算法是处理属性分类问题的常用方法,随机森林模型是众多分类算法中精度较高的方法之一。将随机森林模型引入调查问卷缺失数据的插补研究中,提出了基于随机森林模型的分类数据缺失值插补方法,并根据不同的缺失模式探讨了相应的插补步骤。通过与其它方法的实证模拟比较,表明随机森林插补法得到的插补值准确度更优、可信度更高。相似文献

18.

Missing data sensitivity analysis for recurrent event data using controlled imputation

Oliver N. Keene James H. Roger Benjamin F. Hartley Michael G. Kenward 《Pharmaceutical statistics》2014,13(4):258-264

Statistical analyses of recurrent event data have typically been based on the missing at random assumption. One implication of this is that, if data are collected only when patients are on their randomized treatment, the resulting de jure estimator of treatment effect corresponds to the situation in which the patients adhere to this regime throughout the study. For confirmatory analysis of clinical trials, sensitivity analyses are required to investigate alternative de facto estimands that depart from this assumption. Recent publications have described the use of multiple imputation methods based on pattern mixture models for continuous outcomes, where imputation for the missing data for one treatment arm (e.g. the active arm) is based on the statistical behaviour of outcomes in another arm (e.g. the placebo arm). This has been referred to as controlled imputation or reference‐based imputation. In this paper, we use the negative multinomial distribution to apply this approach to analyses of recurrent events and other similar outcomes. The methods are illustrated by a trial in severe asthma where the primary endpoint was rate of exacerbations and the primary analysis was based on the negative binomial model. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

19.

A Simulation Study Comparing Multiple Imputation Methods for Incomplete Longitudinal Ordinal Data

A. F. Donneau M. Mauer G. Molenberghs A. Albert 《统计学通讯:模拟与计算》2015,44(5):1311-1338

Multiple imputation (MI) is now a reference solution for handling missing data. The default method for MI is the Multivariate Normal Imputation (MNI) algorithm that is based on the multivariate normal distribution. In the presence of longitudinal ordinal missing data, where the Gaussian assumption is no longer valid, application of the MNI method is questionable. This simulation study compares the performance of the MNI and ordinal imputation regression model for incomplete longitudinal ordinal data for situations covering various numbers of categories of the ordinal outcome, time occasions, sample sizes, rates of missingness, well-balanced, and skewed data. 相似文献

20.

A stochastic variant of the EM algorithm to fit mixed (discrete and continuous) longitudinal data with nonignorable missingness

Abdallah S. A. Yaseen 《统计学通讯:理论与方法》2020,49(18):4446-4467

Abstract

In longitudinal studies data are collected on the same set of units for more than one occasion. In medical studies it is very common to have mixed Poisson and continuous longitudinal data. In such studies, for different reasons, some intended measurements might not be available resulting in a missing data setting. When the probability of missingness is related to the missing values, the missingness mechanism is termed nonrandom. The stochastic expectation-maximization (SEM) algorithm and the parametric fractional imputation (PFI) method are developed to handle nonrandom missingness in mixed discrete and continuous longitudinal data assuming different covariance structures for the continuous outcome. The proposed techniques are evaluated using simulation studies. Also, the proposed techniques are applied to the interstitial cystitis data base (ICDB) data. 相似文献