期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Evaluation of missing data mechanisms in two and three dimensional incomplete tables

Sayan Ghosh Palaniappan Vellaisamy 《Journal of the Korean Statistical Society》2019,48(2):297-313

The analysis of incomplete contingency tables is a practical and an interesting problem. In this paper, we provide characterizations for the various missing mechanisms of a variable in terms of response and non-response odds for two and three dimensional incomplete tables. Log-linear parametrization and some distinctive properties of the missing data models for the above tables are discussed. All possible cases in which data on one, two or all variables may be missing are considered. We study the missingness of each variable in a model, which is more insightful for analyzing cross-classified data than the missingness of the outcome vector. For sensitivity analysis of the incomplete tables, we propose easily verifiable procedures to evaluate the missing at random (MAR), missing completely at random (MCAR) and not missing at random (NMAR) assumptions of the missing data models. These methods depend only on joint and marginal odds computed from fully and partially observed counts in the tables, respectively. Finally, some real-life datasets are analyzed to illustrate our results, which are confirmed based on simulation studies. 相似文献

2.

Data-driven sensitivity analysis to detect missing data mechanism with applications to structural equation modelling

Mortaza Jamshidian Ke-Hai Yuan 《Journal of Statistical Computation and Simulation》2013,83(7):1344-1362

Missing data are a common problem in almost all areas of empirical research. Ignoring the missing data mechanism, especially when data are missing not at random (MNAR), can result in biased and/or inefficient inference. Because MNAR mechanism is not verifiable based on the observed data, sensitivity analysis is often used to assess it. Current sensitivity analysis methods primarily assume a model for the response mechanism in conjunction with a measurement model and examine sensitivity to missing data mechanism via the parameters of the response model. Recently, Jamshidian and Mata (Post-modelling sensitivity analysis to detect the effect of missing data mechanism, Multivariate Behav. Res. 43 (2008), pp. 432–452) introduced a new method of sensitivity analysis that does not require the difficult task of modelling the missing data mechanism. In this method, a single measurement model is fitted to all of the data and to a sub-sample of the data. Discrepancy in the parameter estimates obtained from the the two data sets is used as a measure of sensitivity to missing data mechanism. Jamshidian and Mata describe their method mainly in the context of detecting data that are missing completely at random (MCAR). They used a bootstrap type method, that relies on heuristic input from the researcher, to test for the discrepancy of the parameter estimates. Instead of using bootstrap, the current article obtains confidence interval for parameter differences on two samples based on an asymptotic approximation. Because it does not use bootstrap, the developed procedure avoids likely convergence problems with the bootstrap methods. It does not require heuristic input from the researcher and can be readily implemented in statistical software. The article also discusses methods of obtaining sub-samples that may be used to test missing at random in addition to MCAR. An application of the developed procedure to a real data set, from the first wave of an ongoing longitudinal study on aging, is presented. Simulation studies are performed as well, using two methods of missing data generation, which show promise for the proposed sensitivity method. One method of missing data generation is also new and interesting in its own right. 相似文献

3.

Performance of standard imputation methods for missing quality of life data as covariate in survival analysis based on simulations from the International Breast Cancer Study Group Trials VI and VII*

Marion Procter Chris Robertson 《统计学通讯:模拟与计算》2013,42(10):3063-3077

Abstract

Imputation methods for missing data on a time-dependent variable within time-dependent Cox models are investigated in a simulation study. Quality of life (QoL) assessments were removed from the complete simulated datasets, which have a positive relationship between QoL and disease-free survival (DFS) and delayed chemotherapy and DFS, by missing at random and missing not at random (MNAR) mechanisms. Standard imputation methods were applied before analysis. Method performance was influenced by missing data mechanism, with one exception for simple imputation. The greatest bias occurred under MNAR and large effect sizes. It is important to carefully investigate the missing data mechanism. 相似文献

4.

A test of the missing data mechanism for repeated measures data

Taesung Park Seungyeoun Lee Robert F. Woolson 《统计学通讯:理论与方法》2013,42(10):2813-2829

The occurrence of missing data is an often unavoidable consequence of repeated measures studies. Fortunately, multivariate general linear models such as growth curve models and linear mixed models with random effects have been well developed to analyze incomplete normally-distributed repeated measures data. Most statistical methods have assumed that the missing data occur at random. This assumption may include two types of missing data mechanism: missing completely at random (MCAR) and missing at random (MAR) in the sense of Rubin (1976). In this paper, we develop a test procedure for distinguishing these two types of missing data mechanism for incomplete normally-distributed repeated measures data. The proposed test is similar in spiril to the test of Park and Davis (1992). We derive the test for incomplete normally-distribrlted repeated measures data using linear mixed models. while Park and Davis (1992) cleirved thr test for incomplete repeatctl categorical data in the framework of Grizzle Starmer. and Koch (1969). Thr proposed procedure can be applied easily to any other multivariate general linear model which allow for missing data. The test is illustrated using the hip-replacernent patient.data from Crowder and Hand (1990). 相似文献

5.

Impact of missing data on type 1 error rates in non‐inferiority trials

Bongin Yoo 《Pharmaceutical statistics》2010,9(2):87-99

In this paper, a simulation study is conducted to systematically investigate the impact of different types of missing data on six different statistical analyses: four different likelihood‐based linear mixed effects models and analysis of covariance (ANCOVA) using two different data sets, in non‐inferiority trial settings for the analysis of longitudinal continuous data. ANCOVA is valid when the missing data are completely at random. Likelihood‐based linear mixed effects model approaches are valid when the missing data are at random. Pattern‐mixture model (PMM) was developed to incorporate non‐random missing mechanism. Our simulations suggest that two linear mixed effects models using unstructured covariance matrix for within‐subject correlation with no random effects or first‐order autoregressive covariance matrix for within‐subject correlation with random coefficient effects provide well control of type 1 error (T1E) rate when the missing data are completely at random or at random. ANCOVA using last observation carried forward imputed data set is the worst method in terms of bias and T1E rate. PMM does not show much improvement on controlling T1E rate compared with other linear mixed effects models when the missing data are not at random but is markedly inferior when the missing data are at random. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

6.

Empirical likelihood method for non-ignorable missing data problems

Zhong Guan Jing Qin 《Lifetime data analysis》2017,23(1):113-135

Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)’s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased. 相似文献

7.

Impact of the non-distinctness and non-ignorability on the inference by multiple imputation in multivariate multilevel data: a simulation assessment

Recai Yucel 《Journal of Statistical Computation and Simulation》2017,87(9):1813-1826

Multiple imputation (MI) is an increasingly popular method for analysing incomplete multivariate data sets. One of the most crucial assumptions of this method relates to mechanism leading to missing data. Distinctness is typically assumed, which indicates a complete independence of mechanisms underlying missingness and data generation. In addition, missing at random or missing completely at random is assumed, which explicitly states under which conditions missingness is independent of observed data. Despite common use of MI under these assumptions, plausibility and sensitivity to these fundamental assumptions have not been well-investigated. In this work, we investigate the impact of non-distinctness and non-ignorability. In particular, non-ignorability is due to unobservable cluster-specific effects (e.g. random-effects). Through a comprehensive simulation study, we show that MI inferences suggest that nonignoriability due to non-distinctness do not immediately imply dismal performance while non-ignorability due to missing not at random leads to quite subpar performance. 相似文献

8.

Missing values: sparse inverse covariance estimation and?an?extension to sparse regression

Nicolas St?dler Peter Bühlmann 《Statistics and Computing》2012,22(1):219-235

We propose an ℓ ₁-regularized likelihood method for estimating the inverse covariance matrix in the high-dimensional multivariate normal model in presence of missing data. Our method is based on the assumption that the data are missing at random (MAR) which entails also the completely missing at random case. The implementation of the method is non-trivial as the observed negative log-likelihood generally is a complicated and non-convex function. We propose an efficient EM algorithm for optimization with provable numerical convergence properties. Furthermore, we extend the methodology to handle missing values in a sparse regression context. We demonstrate both methods on simulated and real data. 相似文献

9.

Handling of missing data in long‐term clinical trials: a case study

Mark Janssens Geert Molenberghs René Kerstens 《Pharmaceutical statistics》2012,11(6):442-448

Missing data in clinical trials is a well‐known problem, and the classical statistical methods used can be overly simple. This case study shows how well‐established missing data theory can be applied to efficacy data collected in a long‐term open‐label trial with a discontinuation rate of almost 50%. Satisfaction with treatment in chronically constipated patients was the efficacy measure assessed at baseline and every 3 months postbaseline. The improvement in treatment satisfaction from baseline was originally analyzed with a paired t‐test ignoring missing data and discarding the correlation structure of the longitudinal data. As the original analysis started from missing completely at random assumptions regarding the missing data process, the satisfaction data were re‐examined, and several missing at random (MAR) and missing not at random (MNAR) techniques resulted in adjusted estimate for the improvement in satisfaction over 12 months. Throughout the different sensitivity analyses, the effect sizes remained significant and clinically relevant. Thus, even for an open‐label trial design, sensitivity analysis, with different assumptions for the nature of dropouts (MAR or MNAR) and with different classes of models (selection, pattern‐mixture, or multiple imputation models), has been found useful and provides evidence towards the robustness of the original analyses; additional sensitivity analyses could be undertaken to further qualify robustness. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

10.

A Bayesian random effects model for analyzing mixed negative binomial and continuous longitudinal responses

E. Bahrami Samani M. Ganjali 《统计学通讯:理论与方法》2013,42(8):2354-2367

ABSTRACT

A general Bayesian random effects model for analyzing longitudinal mixed correlated continuous and negative binomial responses with and without missing data is presented. This Bayesian model, given some random effects, uses a normal distribution for the continuous response and a negative binomial distribution for the count response. A Markov Chain Monte Carlo sampling algorithm is described for estimating the posterior distribution of the parameters. This Bayesian model is illustrated by a simulation study. For sensitivity analysis to investigate the change of parameter estimates with respect to the perturbation from missing at random to not missing at random assumption, the use of posterior curvature is proposed. The model is applied to a medical data, obtained from an observational study on women, where the correlated responses are the negative binomial response of joint damage and continuous response of body mass index. The simultaneous effects of some covariates on both responses are also investigated. 相似文献

11.

Latent class based multiple imputation approach for missing categorical data

Mulugeta Gebregziabher Stacia M. DeSantis 《Journal of statistical planning and inference》2010

In this paper we propose a latent class based multiple imputation approach for analyzing missing categorical covariate data in a highly stratified data model. In this approach, we impute the missing data assuming a latent class imputation model and we use likelihood methods to analyze the imputed data. Via extensive simulations, we study its statistical properties and make comparisons with complete case analysis, multiple imputation, saturated log-linear multiple imputation and the Expectation–Maximization approach under seven missing data mechanisms (including missing completely at random, missing at random and not missing at random). These methods are compared with respect to bias, asymptotic standard error, type I error, and 95% coverage probabilities of parameter estimates. Simulations show that, under many missingness scenarios, latent class multiple imputation performs favorably when jointly considering these criteria. A data example from a matched case–control study of the association between multiple myeloma and polymorphisms of the Inter-Leukin 6 genes is considered. 相似文献

12.

Doubly robust estimation of partially linear models for longitudinal data with dropouts and measurement error in covariates

Huiming Lin Jiajia Zhang Wing K. Fung 《Statistics》2018,52(1):84-98

In longitudinal studies, missing responses and mismeasured covariates are commonly seen due to the data collection process. Without cautiousness in data analysis, inferences from the standard statistical approaches may lead to wrong conclusions. In order to improve the estimation for longitudinal data analysis, a doubly robust estimation method for partially linear models, which can simultaneously account for the missing responses and mismeasured covariates, is proposed. Imprecisions of covariates are corrected by taking advantage of the independence between replicate measurement errors, and missing responses are handled by the doubly robust estimation under the mechanism of missing at random. The asymptotic properties of the proposed estimators are established under regularity conditions, and simulation studies demonstrate desired properties. Finally, the proposed method is applied to data from the Lifestyle Education for Activity and Nutrition study. 相似文献

13.

基于随机森林模型的分类数据缺失值插补

孟杰 ;李春林《统计与信息论坛》2014,(9):86-90

缺失数据是影响调查问卷数据质量的重要因素,对调查问卷中的缺失值进行插补可以显著提高调查数据的质量。调查问卷的数据类型多以分类型数据为主,数据挖掘技术中的分类算法是处理属性分类问题的常用方法,随机森林模型是众多分类算法中精度较高的方法之一。将随机森林模型引入调查问卷缺失数据的插补研究中,提出了基于随机森林模型的分类数据缺失值插补方法,并根据不同的缺失模式探讨了相应的插补步骤。通过与其它方法的实证模拟比较,表明随机森林插补法得到的插补值准确度更优、可信度更高。相似文献

14.

Hypothesis test for paired samples in the presence of missing data

Pablo Martínez-Camblor Norberto Corral Jesus María de la Hera 《Journal of applied statistics》2013,40(1):76-87

Missing data are present in almost all statistical analysis. In simple paired design tests, when some subject has one of the involved variables missing in the so-called partially overlapping samples scheme, it is usually discarded for the analysis. The lack of consistency between the information reported in the univariate and multivariate analysis is, perhaps, the main consequence. Although the randomness on the missing mechanism (missingness completely at random) is an usual and needed assumption for this particular situation, missing data presence could lead to serious inconsistencies on the reported conclusions. In this paper, the authors develop a simple and direct procedure which allows using the whole available information in order to perform paired tests. In particular, the proposed methodology is applied to check the equality among the means from two paired samples. In addition, the use of two different resampling techniques is also explored. Finally, real-world data are analysed. 相似文献

15.

The impact of missing data on the results of a schizophrenia study

下载免费PDF全文

Denis Rybin Gheorghe Doros Robert Rosenheck Robert Lew 《Pharmaceutical statistics》2015,14(1):4-10

Missing data pose a serious challenge to the integrity of randomized clinical trials, especially of treatments for prolonged illnesses such as schizophrenia, in which long‐term impact assessment is of great importance, but the follow‐up rates are often no more than 50%. Sensitivity analysis using Bayesian modeling for missing data offers a systematic approach to assessing the sensitivity of the inferences made on the basis of observed data. This paper uses data from an 18‐month study of veterans with schizophrenia to demonstrate this approach. Data were obtained from a randomized clinical trial involving 369 patients diagnosed with schizophrenia that compared long‐acting injectable risperidone with a psychiatrist's choice of oral treatment. Bayesian analysis utilizing a pattern‐mixture modeling approach was used to validate the reported results by detecting bias due to non‐random patterns of missing data. The analysis was applied to several outcomes including standard measures of schizophrenia symptoms, quality of life, alcohol use, and global mental status. The original study results for several measures were confirmed against a wide range of patterns of non‐random missingness. Robustness of the conclusions was assessed using sensitivity parameters. The missing data in the trial did not likely threaten the validity of previously reported results. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

16.

The impact of missing data and how it is handled on the rate of false-positive results in drug development

Barnes SA Mallinckrodt CH Lindborg SR Carter MK 《Pharmaceutical statistics》2008,7(3):215-225

In drug development, a common choice for the primary analysis is to assess mean changes via analysis of (co)variance with missing data imputed by carrying the last or baseline observations forward (LOCF, BOCF). These approaches assume that data are missing completely at random (MCAR). Multiple imputation (MI) and likelihood-based repeated measures (MMRM) are less restrictive as they assume data are missing at random (MAR). Nevertheless, LOCF and BOCF remain popular, perhaps because it is thought that the bias in these methods lead to protection against falsely concluding that a drug is more effective than the control. We conducted a simulation study that compared the rate of false positive results or regulatory risk error (RRE) from BOCF, LOCF, MI, and MMRM in 32 scenarios that were generated from a 2(5) full factorial arrangement with data missing due to a missing not at random (MNAR) mechanism. Both BOCF and LOCF inflated RRE were compared to MI and MMRM. In 12 of the 32 scenarios, BOCF yielded inflated RRE compared with eight scenarios for LOCF, three scenarios for MI and four scenarios for MMRM. In no situation did BOCF or LOCF provide adequate control of RRE when MI and MMRM did not. Both MI and MMRM are better choices than either BOCF or LOCF for the primary analysis. 相似文献

17.

Post-stratification based direct adjustment approach to a missing data problem in clinical trials

《Journal of statistical planning and inference》2001,96(1):247-262

In clinical trials we always expect some missing data. If data are missing completely at random, then missing data can be ignored for the purpose of statistical inference. In most situation, however, ignoring missing data will introduce bias. Adjustment is possible for missing data if the missing mechanism is known, which is rare in real problems. Our approach is to estimate directly the mean outcome of each treatment group in the presence of missing data. To this end, we post-stratify all the subjects by the expected value of outcome (or by a variable predictive of the outcome) so that subjects within a stratum may be considered homogeneous with respect to the expected outcome, and assume that subjects within a stratum are missing at random. We apply this post-stratification approach to a recently concluded clinical trial where a high proportion of data are missing and the missingness depends on the same factors affecting the outcome variable. A simulation study shows that the post-stratification approach reduces the bias substantially compared to the naive approach where only non-missing subjects are analyzed. 相似文献

18.

Analysis of ordinal outcomes with longitudinal covariates subject to missingness

Melody S. Goodman Yi Li Anne M. Stoddard Glorian Sorensen 《Journal of applied statistics》2014,41(5):1040-1052

We propose a mixture model for data with an ordinal outcome and a longitudinal covariate that is subject to missingness. Data from a tailored telephone delivered, smoking cessation intervention for construction laborers are used to illustrate the method, which considers as an outcome a categorical measure of smoking cessation, and evaluates the effectiveness of the motivational telephone interviews on this outcome. We propose two model structures for the longitudinal covariate, for the case when the missing data are missing at random, and when the missing data mechanism is non-ignorable. A generalized EM algorithm is used to obtain maximum likelihood estimates. 相似文献

19.

Estimation Bias in Complete-Case Analysis in Crossover Studies with Missing Data

Fang Liu 《统计学通讯:理论与方法》2013,42(5):812-827

Crossover designs are used often in clinical trials. It is not uncommon that subjects discontinue before completing all treatment periods in a crossover study. Despite availability of statistical methodologies utilizing all available data and software for obtaining valid inferences under the assumption of missing at random (MAR), naïve approaches, such as the complete case (CC) analysis, which is only valid with a strong assumption of missing completely at random are still widely used in practice. In this article, we obtain the analytical form of the estimation bias of treatment effects with CC for linear mixed models. We use simulation studies to examine the inflation of Type I error and efficiency loss in the inferences with CC under MAR. Invalidity and inefficiency of two other commonly used approaches for defining analyzed data in the presence of missing data, including data from at least two periods in three period crossover and available cases for a specific comparison of interest, are also demonstrated through simulation studies. 相似文献

20.

E. Bahrami Samani M. Ganjali 《统计学通讯:理论与方法》2014,43(13):2659-2673

Regression models with random effects are proposed for joint analysis of negative binomial and ordinal longitudinal data with nonignorable missing values under fully parametric framework. The presented model simultaneously considers a multivariate probit regression model for the missing mechanisms, which provides the ability of examining the missing data assumptions and a multivariate mixed model for the responses. Random effects are used to take into account the correlation between longitudinal responses of the same individual. A full likelihood-based approach that allows yielding maximum likelihood estimates of the model parameters is used. The model is applied to a medical data, obtained from an observational study on women, where the correlated responses are the ordinal response of osteoporosis of the spine and negative binomial response is the number of joint damage. A sensitivity of the results to the assumptions is also investigated. The effect of some covariates on all responses are investigated simultaneously. 相似文献