首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multiple imputation (MI) is now a reference solution for handling missing data. The default method for MI is the Multivariate Normal Imputation (MNI) algorithm that is based on the multivariate normal distribution. In the presence of longitudinal ordinal missing data, where the Gaussian assumption is no longer valid, application of the MNI method is questionable. This simulation study compares the performance of the MNI and ordinal imputation regression model for incomplete longitudinal ordinal data for situations covering various numbers of categories of the ordinal outcome, time occasions, sample sizes, rates of missingness, well-balanced, and skewed data.  相似文献   

2.
Fractional regression hot deck imputation (FRHDI) imputes multiple values for each instance of a missing dependent variable. The imputed values are equal to the predicted value plus multiple random residuals. Fractional weights enable variance estimation and preserve correlations. In some circumstances with some starting weight values, existing procedures for computing FRHDI weights can produce negative values. We discuss procedures for constructing non-negative adjusted fractional weights for FRHDI and study performance of the algorithm using simulation. The algorithm can be used effectively with FRDHI procedures for handling missing data in the context of a complex sample survey.  相似文献   

3.
This article develops a functional form of the generalized Poisson regression model that parametrically nests the Poisson and the two well known generalized Poisson regression models (GP-1 and GP-2). The proposed model is applied on the Malaysian motor insurance claim count data.  相似文献   

4.
In the presence of missing values, researchers may be interested in the rates of missing information. The rates of missing information are (a) important for assessing how the missing information contributes to inferential uncertainty about, Q, the population quantity of interest, (b) are an important component in the decision of the number of imputations, and (c) can be used to test model uncertainty and model fitting. In this article I will derive the asymptotic distribution of the rates of missing information in two scenarios: the conventional multiple imputation (MI), and the two-stage MI. Numerically I will show that the proposed asymptotic distribution agrees with the simulated one. I will also suggest the number of imputations needed to obtain reliable missing information rate estimates for each method, based on the asymptotic distribution.  相似文献   

5.
ABSTRACT

Missing data are commonly encountered in self-reported measurements and questionnaires. It is crucial to treat missing values using appropriate method to avoid bias and reduction of power. Various types of imputation methods exist, but it is not always clear which method is preferred for imputation of data with non-normal variables. In this paper, we compared four imputation methods: mean imputation, quantile imputation, multiple imputation, and quantile regression multiple imputation (QRMI), using both simulated and real data investigating factors affecting self-efficacy in breast cancer survivors. The results displayed an advantage of using multiple imputation, especially QRMI when data are not normal.  相似文献   

6.
基于链式方程的收入变量 缺失值的多重插补   总被引:2,自引:0,他引:2       下载免费PDF全文
刘凤芹 《统计研究》2009,26(1):71-77
 在经济计量分析中收入变量的缺失值是一个普遍而又较难处理的问题。传统的处理方法往往导致分析结果具有系统偏差。本文提出利用基于链式方程的多重插补方法来处理收入变量的缺失值问题。文章将此方法应用到一个实际数据集,然后通过分析插补后的数据集讨论了此方法的性质,并和其他多重插补方法进行了比较。结果表明:基于链式方程的多重插补能在一定程度上纠正推断结果的系统偏差,并且给出恰当的标准差估计。  相似文献   

7.
Liang and Zeger (1986) proposed an extension of generalized linear models to the analysis of longitudinal data. In their formulation, a common dispersion parameter assumption across observation times is required. However, this assumption is not expected to hold in most situations. Park (1993) proposed a simple extension of Liang and Zeger's formulation to allow for different dispersion parameters for each time point. The proposed model is easy to apply without heavy computations and useful to handle the cases when variations in over-dispersion over time exist. In this paper, we focus on evaluating the effect of additional dispersion parameters on the estimators of model parameters. Through a Monte Carlo simulation study, efficiency of Park's method is compared with the Liang and Zeger's method.  相似文献   

8.
In this article, we consider statistical inference for longitudinal partial linear models when the response variable is sometimes missing with missingness probability depending on the covariate that is measured with error. A generalized empirical likelihood (GEL) method is proposed by combining correction attenuation and quadratic inference functions. The method that takes into consideration the correlation within groups is used to estimate the regression coefficients. Furthermore, residual-adjusted empirical likelihood (EL) is employed for estimating the baseline function so that undersmoothing is avoided. The empirical log-likelihood ratios are proven to be asymptotically Chi-squared, and the corresponding confidence regions for the parameters of interest are then constructed. Compared with methods based on NAs, the GEL does not require consistent estimators for the asymptotic variance and bias. The numerical study is conducted to compare the performance of the EL and the normal approximation-based method, and a real example is analysed.  相似文献   

9.
This article presents findings from a case study of different approaches to the treatment of missing data. Simulations based on data from the Los Angeles Mammography Promotion in Churches Program (LAMP) led the authors to the following cautionary conclusions about the treatment of missing data: (1) Automated selection of the imputation model in the use of full Bayesian multiple imputation can lead to unexpected bias in coefficients of substantive models. (2) Under conditions that occur in actual data, casewise deletion can perform less well than we were led to expect by the existing literature. (3) Relatively unsophisticated imputations, such as mean imputation and conditional mean imputation, performed better than the technical literature led us to expect. (4) To underscore points (1), (2), and (3), the article concludes that imputation models are substantive models, and require the same caution with respect to specificity and calculability. The research reported here was partially supported by National Institutes of Health, National Cancer Institute, R01 CA65879 (SAF). We thank Nicholas Wolfinger, Naihua Duan, John Adams, John Fox, and the anonymous referees for their thoughtful comments on earlier drafts. The responsibility for any remaining errors is ours alone. Benjamin Stein was exceptionally helpful in orchestrating the simulations at the labs of UCLA Social Science Computing. Michael Mitchell of the UCLA Academic Technology Services Statistical Consulting Group artfully created Fig. 1 using the Stata graphics language; we are most grateful.  相似文献   

10.
基于随机森林模型的分类数据缺失值插补   总被引:6,自引:1,他引:6  
缺失数据是影响调查问卷数据质量的重要因素,对调查问卷中的缺失值进行插补可以显著提高调查数据的质量。调查问卷的数据类型多以分类型数据为主,数据挖掘技术中的分类算法是处理属性分类问题的常用方法,随机森林模型是众多分类算法中精度较高的方法之一。将随机森林模型引入调查问卷缺失数据的插补研究中,提出了基于随机森林模型的分类数据缺失值插补方法,并根据不同的缺失模式探讨了相应的插补步骤。通过与其它方法的实证模拟比较,表明随机森林插补法得到的插补值准确度更优、可信度更高。  相似文献   

11.
In longitudinal surveys where a number of observations have to be made on the same sampling unit at specified time intervals, it is not uncommon that observations for some of the time stages for some of the sampled units are found missing. In the present investigation an estimation procedure for estimating the population total based on such incomplete data from multiple observations is suggested which makes use of all the available information and is seen to be more efficient than the one based on only completely observed units. Estimators are also proposed for two other situations; firstly when data is collected only for a sample of time stages and secondly when data is observed for only one time stage per sampled unit.  相似文献   

12.
Many analyses for incomplete longitudinal data are directed to examining the impact of covariates on the marginal mean responses. We consider the setting in which longitudinal responses are collected from individuals nested within clusters. We discuss methods for assessing covariate effects on the mean and association parameters when covariates are incompletely observed. Weighted first and second order estimating equations are constructed to obtain consistent estimates of mean and association parameters when covariates are missing at random. Empirical studies demonstrate that estimators from the proposed method have negligible finite sample biases in moderate samples. An application to the National Alzheimer's Coordinating Center (NACC) Uniform Data Set (UDS) demonstrates the utility of the proposed method.  相似文献   

13.
ABSTRACT

We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the PCA model. Using a simulation study and real data sets, the method is compared to two classical approaches: multiple imputation based on joint modelling and on fully conditional modelling. Contrary to the others, the proposed method can be easily used on data sets where the number of individuals is less than the number of variables and when the variables are highly correlated. In addition, it provides unbiased point estimates of quantities of interest, such as an expectation, a regression coefficient or a correlation coefficient, with a smaller mean squared error. Furthermore, the widths of the confidence intervals built for the quantities of interest are often smaller whilst ensuring a valid coverage.  相似文献   

14.
This article discusses regression analysis of current status data, which occur in many fields including cross-sectional studies, demographical investigations, and tumorigenicity experiments (Keiding, 1991 Keiding , N. ( 1991 ). Age-specific incidence and prevalence: a statistical perspective (with discussion) . J. Roy. Statist. Soc. Ser. A 154 : 371412 .[Crossref] [Google Scholar]; Sun 2006 Sun , J. ( 2006 ). The Statistical Analysis of Interval-Censored Failure Time Data . New York : Springer-Verlag . [Google Scholar]). For the problem, we focus on the situation where the survival time of interest can be described by the additive hazards model and a multiple imputation approach is presented for inference. A major advantage of the approach is its simplicity and it can be easily implemented by using the existing software packages for right-censored failure time data. Extensive simulation studies are conducted and indicate that the approach performs well for practical situations and is comparable to the existing methods. The methodology is applied to a set of current status data arising from a tumorigenicity experiment and the model checking is discussed.  相似文献   

15.
Many large-scale sample surveys use panel designs under which sampled individuals are interviewed several times before being dropped from the sample. The longitudinal data bases available from such surveys could be used to provide estimates of gross change over time. One problem in using these data to estimate gross change is how to handle the period-to-period nonresponse. This nonresponse is typically nonrandom and, furthermore, may be nonignorable in that it cannot be accounted for by other observed quantities in the data. Under the models proposed in this article, which are appropriate for the analysis of categorical data, the probability of nonresponse may be taken to be a function of the missing variable of interest. The proposed models are fit using maximum likelihood estimation. As an example, the method is applied to the problem of estimating gross flows in labor-force participation using data from the Current Population Survey and the Canadian Labour Force Survey.  相似文献   

16.
于力超  金勇进 《统计研究》2018,35(11):93-104
大规模抽样调查多采用复杂抽样设计,得到具有分层嵌套结构的调查数据集,其中不可避免会遇到数据缺失问题,针对分层结构含缺失数据集的插补策略目前鲜有研究。本文将Gibbs算法应用到分层含缺失数据集的多重插补过程中,分别研究了固定效应模型插补法和随机效应模型插补法,进而通过理论推导和数值模拟,在不同组内相关系数、群组规模、数据缺失比例等情形下,从参数估计结果的无偏性和有效性两方面,比较不同方法的插补效果,给出插补模型的选择建议。研究结果表明,采用随机效应模型作为插补模型时,得到的参数估计结果更准确,而固定效应模型作为插补模型操作相对简便,在数据缺失比例较小、组内相关系数较大、群组规模较大等情形下,可以采用固定效应插补模型,否则建议采用随机效应插补模型。  相似文献   

17.
The sensitivity of multiple imputation methods to deviations from their distributional assumptions is investigated using simulations, where the parameters of scientific interest are the coefficients of a linear regression model, and values in predictor variables are missing at random. The performance of a newly proposed imputation method based on generalized additive models for location, scale, and shape (GAMLSS) is investigated. Although imputation methods based on predictive mean matching are virtually unbiased, they suffer from mild to moderate under-coverage, even in the experiment where all variables are jointly normal distributed. The GAMLSS method features better coverage than currently available methods.  相似文献   

18.
In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered.  相似文献   

19.
In this article, we compare alternative missing imputation methods in the presence of ordinal data, in the framework of CUB (Combination of Uniform and (shifted) Binomial random variable) models. Various imputation methods are considered, as are univariate and multivariate approaches. The first step consists of running a simulation study designed by varying the parameters of the CUB model, to consider and compare CUB models as well as other methods of missing imputation. We use real datasets on which to base the comparison between our approach and some general methods of missing imputation for various missing data mechanisms.  相似文献   

20.
A controlled clinical trial was conducted to investigate the efficacy effect of a chemical compound in the treatment of Premenstrual Dysphoric Disorder (PMDD). The data from the trial showed a non-monotone pattern of missing data and an ante-dependence covariance structure. A new analytical method for imputing the missing data with the ante-dependence covariance is proposed. The PMDD data are analysed by the non-imputation method and two imputation methods: the proposed method and the MCMC method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号