期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Inference and Missing Data: Asymptotic Results

Søren Feodor Nielsen 《Scandinavian Journal of Statistics》1997,24(2):261-274

In Rubin (1976) the missing at random (MAR) and missing completely at random (MCAR) conditions are discussed. It is concluded that the MAR condition allows one to ignore the missing data mechanism when doing likelihood or Bayesian inference but also that the stronger MCAR condition is in some sense the weakest generally sufficient condition allowing (conditional) frequentist inference while ignoring the missing data mechanism. In this paper it is shown that (a slightly strengthened version of) the MAR condition is sufficient to yield ordinary large sample results for estimators and test statistics and thus may be used for (asymptotic) frequentist inference. 相似文献

2.

The impact of dichotomization in longitudinal data analysis: a simulation study

Bongin Yoo 《Pharmaceutical statistics》2010,9(4):298-312

In this paper, a simulation study is conducted to systematically investigate the impact of dichotomizing longitudinal continuous outcome variables under various types of missing data mechanisms. Generalized linear models (GLM) with standard generalized estimating equations (GEE) are widely used for longitudinal outcome analysis, but these semi‐parametric approaches are only valid under missing data completely at random (MCAR). Alternatively, weighted GEE (WGEE) and multiple imputation GEE (MI‐GEE) were developed to ensure validity under missing at random (MAR). Using a simulation study, the performance of standard GEE, WGEE and MI‐GEE on incomplete longitudinal dichotomized outcome analysis is evaluated. For comparisons, likelihood‐based linear mixed effects models (LMM) are used for incomplete longitudinal original continuous outcome analysis. Focusing on dichotomized outcome analysis, MI‐GEE with original continuous missing data imputation procedure provides well controlled test sizes and more stable power estimates compared with any other GEE‐based approaches. It is also shown that dichotomizing longitudinal continuous outcome will result in substantial loss of power compared with LMM. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

3.

Evaluation of missing data mechanisms in two and three dimensional incomplete tables

Sayan Ghosh Palaniappan Vellaisamy 《Journal of the Korean Statistical Society》2019,48(2):297-313

The analysis of incomplete contingency tables is a practical and an interesting problem. In this paper, we provide characterizations for the various missing mechanisms of a variable in terms of response and non-response odds for two and three dimensional incomplete tables. Log-linear parametrization and some distinctive properties of the missing data models for the above tables are discussed. All possible cases in which data on one, two or all variables may be missing are considered. We study the missingness of each variable in a model, which is more insightful for analyzing cross-classified data than the missingness of the outcome vector. For sensitivity analysis of the incomplete tables, we propose easily verifiable procedures to evaluate the missing at random (MAR), missing completely at random (MCAR) and not missing at random (NMAR) assumptions of the missing data models. These methods depend only on joint and marginal odds computed from fully and partially observed counts in the tables, respectively. Finally, some real-life datasets are analyzed to illustrate our results, which are confirmed based on simulation studies. 相似文献

4.

Different methods for handling incomplete longitudinal binary outcome due to missing at random dropout

《Statistical Methodology》2015

This paper compares the performance of weighted generalized estimating equations (WGEEs), multiple imputation based on generalized estimating equations (MI-GEEs) and generalized linear mixed models (GLMMs) for analyzing incomplete longitudinal binary data when the underlying study is subject to dropout. The paper aims to explore the performance of the above methods in terms of handling dropouts that are missing at random (MAR). The methods are compared on simulated data. The longitudinal binary data are generated from a logistic regression model, under different sample sizes. The incomplete data are created for three different dropout rates. The methods are evaluated in terms of bias, precision and mean square error in case where data are subject to MAR dropout. In conclusion, across the simulations performed, the MI-GEE method performed better in both small and large sample sizes. Evidently, this should not be seen as formal and definitive proof, but adds to the body of knowledge about the methods’ relative performance. In addition, the methods are compared using data from a randomized clinical trial. 相似文献

5.

Power and sample size for GEE analysis of incomplete paired outcomes in 2 × 2 crossover trials

Yongqiang Tang 《Pharmaceutical statistics》2021,20(4):820-839

The 2 × 2 crossover trial uses subjects as their own control to reduce the intersubject variability in the treatment comparison, and typically requires fewer subjects than a parallel design. The generalized estimating equations (GEE) methodology has been commonly used to analyze incomplete discrete outcomes from crossover trials. We propose a unified approach to the power and sample size determination for the Wald Z-test and t-test from GEE analysis of paired binary, ordinal and count outcomes in crossover trials. The proposed method allows misspecification of the variance and correlation of the outcomes, missing outcomes, and adjustment for the period effect. We demonstrate that misspecification of the working variance and correlation functions leads to no or minimal efficiency loss in GEE analysis of paired outcomes. In general, GEE requires the assumption of missing completely at random. For bivariate binary outcomes, we show by simulation that the GEE estimate is asymptotically unbiased or only minimally biased, and the proposed sample size method is suitable under missing at random (MAR) if the working correlation is correctly specified. The performance of the proposed method is illustrated with several numerical examples. Adaption of the method to other paired outcomes is discussed. 相似文献

6.

Bayesian estimation of the complete sample size from an incomplete poisson sample

W.O. Williford W.Jan Shan 《统计学通讯:理论与方法》2013,42(7):835-846

For the Poisson a posterior distribution for the complete sample size, N, is derived from an incomplete sample when any specified subset of the classes are missing.Means as well as other posterior characteristics of N are obtained for two examples with various classes removed. For the special case of a truncated ‘missing zero class’ Poisson sample a simulation experiment is performed for the small ‘N=25’ sample situation applying both Bayesian and maximum likelihood methods of estimation. 相似文献

7.

Nonparametric estimation for survival data with censoring indicators missing at random

E. Brunel F. Comte A. Guilloux 《Journal of statistical planning and inference》2013

In this paper, we consider the problem of hazard rate estimation in the presence of covariates, for survival data with censoring indicators missing at random. We propose in the context usually denoted by MAR (missing at random, in opposition to MCAR, missing completely at random, which requires an additional independence assumption), nonparametric adaptive strategies based on model selection methods for estimators admitting finite dimensional developments in functional orthonormal bases. Theoretical risk bounds are provided, they prove that the estimators behave well in term of mean square integrated error (MISE). Simulation experiments illustrate the statistical procedure. 相似文献

8.

An application of nonparametric regression to missing data in large market surveys

Gary Madden Paul Rappoport Aniruddha Banerjee 《Journal of applied statistics》2018,45(7):1292-1302

Non-response (or missing data) is often encountered in large-scale surveys. To enable the behavioural analysis of these data sets, statistical treatments are commonly applied to complete or remove these data. However, the correctness of such procedures critically depends on the nature of the underlying missingness generation process. Clearly, the efficacy of applying either case deletion or imputation procedures rests on the unknown missingness generation mechanism. The contribution of this paper is twofold. The study is the first to propose a simple sequential method to attempt to identify the form of missingness. Second, the effectiveness of the tests is assessed by generating (experimentally) nine missing data sets by imposed MCAR, MAR and NMAR processes, with data removed. 相似文献

9.

Model selection information criteria in latent class models with missing data and contingency question

《Journal of Statistical Computation and Simulation》2012,82(1):159-170

Latent class analysis (LCA) has been found to have important applications in social and behavioural sciences for modelling categorical response variables, and non-response is typical when collecting data. In this study, the non-response mainly included ‘contingency questions’ and real ‘missing data’. The primary objective of this study was to evaluate the effects of some potential factors on model selection indices in LCA with non-response data. We simulated missing data with contingency question and evaluated the accuracy rates of eight information criteria for selecting the correct models. The results showed that the main factors are latent class proportions, conditional probabilities, sample size, the number of items, the missing data rate and the contingency data rate. Interactions of the conditional probabilities with class proportions, sample size and the number of items are also significant. From our simulation results, the impact of missing data and contingency questions can be amended by increasing the sample size or the number of items. 相似文献

10.

A test of the missing data mechanism for repeated measures data

Taesung Park Seungyeoun Lee Robert F. Woolson 《统计学通讯:理论与方法》2013,42(10):2813-2829

The occurrence of missing data is an often unavoidable consequence of repeated measures studies. Fortunately, multivariate general linear models such as growth curve models and linear mixed models with random effects have been well developed to analyze incomplete normally-distributed repeated measures data. Most statistical methods have assumed that the missing data occur at random. This assumption may include two types of missing data mechanism: missing completely at random (MCAR) and missing at random (MAR) in the sense of Rubin (1976). In this paper, we develop a test procedure for distinguishing these two types of missing data mechanism for incomplete normally-distributed repeated measures data. The proposed test is similar in spiril to the test of Park and Davis (1992). We derive the test for incomplete normally-distribrlted repeated measures data using linear mixed models. while Park and Davis (1992) cleirved thr test for incomplete repeatctl categorical data in the framework of Grizzle Starmer. and Koch (1969). Thr proposed procedure can be applied easily to any other multivariate general linear model which allow for missing data. The test is illustrated using the hip-replacernent patient.data from Crowder and Hand (1990). 相似文献

11.

Bias from the use of generalized estimating equations to analyze incomplete longitudinal binary data

Andrew J. Copas Shaun R. Seaman 《Journal of applied statistics》2010,37(6):911-922

Patient dropout is a common problem in studies that collect repeated binary measurements. Generalized estimating equations (GEE) are often used to analyze such data. The dropout mechanism may be plausibly missing at random (MAR), i.e. unrelated to future measurements given covariates and past measurements. In this case, various authors have recommended weighted GEE with weights based on an assumed dropout model, or an imputation approach, or a doubly robust approach based on weighting and imputation. These approaches provide asymptotically unbiased inference, provided the dropout or imputation model (as appropriate) is correctly specified. Other authors have suggested that, provided the working correlation structure is correctly specified, GEE using an improved estimator of the correlation parameters (‘modified GEE’) show minimal bias. These modified GEE have not been thoroughly examined. In this paper, we study the asymptotic bias under MAR dropout of these modified GEE, the standard GEE, and also GEE using the true correlation. We demonstrate that all three methods are biased in general. The modified GEE may be preferred to the standard GEE and are subject to only minimal bias in many MAR scenarios but in others are substantially biased. Hence, we recommend the modified GEE be used with caution. 相似文献

12.

Model selection of generalized partially linear models with missing covariates

Ying-Zi FuXue-Dong Chen 《Journal of statistical planning and inference》2012,142(1):126-138

In this paper, a generalized partially linear model (GPLM) with missing covariates is studied and a Monte Carlo EM (MCEM) algorithm with penalized-spline (P-spline) technique is developed to estimate the regression coefficients and nonparametric function, respectively. As classical model selection procedures such as Akaike's information criterion become invalid for our considered models with incomplete data, some new model selection criterions for GPLMs with missing covariates are proposed under two different missingness mechanism, say, missing at random (MAR) and missing not at random (MNAR). The most attractive point of our method is that it is rather general and can be extended to various situations with missing observations based on EM algorithm, especially when no missing data involved, our new model selection criterions are reduced to classical AIC. Therefore, we can not only compare models with missing observations under MAR/MNAR settings, but also can compare missing data models with complete-data models simultaneously. Theoretical properties of the proposed estimator, including consistency of the model selection criterions are investigated. A simulation study and a real example are used to illustrate the proposed methodology. 相似文献

13.

Partial identification of average treatment effects on the treated through difference-in-differences

Yanqin Fan Carlos A. Manzanares 《Econometric Reviews》2017,36(6-9):1057-1080

ABSTRACT

The difference-in-differences (DID) method is widely used as a tool for identifying causal effects of treatments in program evaluation. When panel data sets are available, it is well-known that the average treatment effect on the treated (ATT) is point-identified under the DID setup. If a panel data set is not available, repeated cross sections (pretreatment and posttreatment) may be used, but may not point-identify the ATT. This paper systematically studies the identification of the ATT under the DID setup when posttreatment treatment status is unknown for the pretreatment sample. This is done through a novel application of an extension of a continuous version of the classical monotone rearrangement inequality which allows for general copula bounds. The identifying power of an instrumental variable and of a ‘matched subsample’ is also explored. Finally, we illustrate our approach by estimating the effect of the Americans with Disabilities Act of 1991 on employment outcomes of the disabled. 相似文献

14.

Modeling dropouts by conditional distribution, a copula-based approach

Ene Krik Meelis Krik 《Journal of statistical planning and inference》2009,139(11):3830

In this paper the concept of copulas is implemented into the methodology for solving the imputation problem in correlated incomplete data. We use the Gaussian copula as alternative to the joint distribution for modeling the conditional distribution, conditioned by the observed values of measurements. The general formula for imputation and its application for compound symmetry correlation structure are given. 相似文献

15.

The impact of missing data and how it is handled on the rate of false-positive results in drug development

Barnes SA Mallinckrodt CH Lindborg SR Carter MK 《Pharmaceutical statistics》2008,7(3):215-225

In drug development, a common choice for the primary analysis is to assess mean changes via analysis of (co)variance with missing data imputed by carrying the last or baseline observations forward (LOCF, BOCF). These approaches assume that data are missing completely at random (MCAR). Multiple imputation (MI) and likelihood-based repeated measures (MMRM) are less restrictive as they assume data are missing at random (MAR). Nevertheless, LOCF and BOCF remain popular, perhaps because it is thought that the bias in these methods lead to protection against falsely concluding that a drug is more effective than the control. We conducted a simulation study that compared the rate of false positive results or regulatory risk error (RRE) from BOCF, LOCF, MI, and MMRM in 32 scenarios that were generated from a 2(5) full factorial arrangement with data missing due to a missing not at random (MNAR) mechanism. Both BOCF and LOCF inflated RRE were compared to MI and MMRM. In 12 of the 32 scenarios, BOCF yielded inflated RRE compared with eight scenarios for LOCF, three scenarios for MI and four scenarios for MMRM. In no situation did BOCF or LOCF provide adequate control of RRE when MI and MMRM did not. Both MI and MMRM are better choices than either BOCF or LOCF for the primary analysis. 相似文献

16.

On Jointly Estimating Parameters and Missing Data by Maximizing the Complete-Data Likelihood

Roderick J.A. Little Donald B. Rubin 《The American statistician》2013,67(3):218-220

One approach to handling incomplete data occasionally encountered in the literature is to treat the missing data as parameters and to maximize the complete-data likelihood over the missing data and parameters. This article points out that although this approach can be useful in particular problems, it is not a generally reliable approach to the analysis of incomplete data. In particular, it does not share the optimal properties of maximum likelihood estimation, except under the trivial asymptotics in which the proportion of missing data goes to zero as the sample size increases. 相似文献

17.

Directional dependence via Gaussian copula beta regression model with asymmetric GARCH marginals

Jong-Min Kim 《统计学通讯:模拟与计算》2017,46(10):7639-7653

This article proposes a new directional dependence by using the Gaussian copula beta regression model. In particular, we consider an asymmetric Generalized AutoRegressive Conditional Heteroscedasticity (GARCH) model for the marginal distribution of standardized residuals to make data exhibiting conditionally heteroscedasticity to white noise process. With the simulated data generated by an asymmetric bivariate copula, we verify our proposed directional dependence method. For the multivariate direction dependence by using the Gaussian copula beta regression model, we employ a three-dimensional archemedian copula to generate trivariate data and then show the directional dependence for one random variable given two other random variables. With West Texas Intermediate Daily Price (WTI) and the Standard & Poor’s 500 (S&P 500), our proposed directional dependence by the Gaussian copula beta regression model reveals that the directional dependence from WTI to S&P 500 is greater than that from S&P 500 to WTI. To validate our empirical result, the Granger causality test is conducted, confirming the same result produced by our method. 相似文献

18.

Asymmetric Forecast Densities for U.S. Macroeconomic Variables from a Gaussian Copula Model of Cross-Sectional and Serial Dependence

Michael S. Smith Shaun P. Vahey 《商业与经济统计学杂志》2016,34(3):416-434

Most existing reduced-form macroeconomic multivariate time series models employ elliptical disturbances, so that the forecast densities produced are symmetric. In this article, we use a copula model with asymmetric margins to produce forecast densities with the scope for severe departures from symmetry. Empirical and skew t distributions are employed for the margins, and a high-dimensional Gaussian copula is used to jointly capture cross-sectional and (multivariate) serial dependence. The copula parameter matrix is given by the correlation matrix of a latent stationary and Markov vector autoregression (VAR). We show that the likelihood can be evaluated efficiently using the unique partial correlations, and estimate the copula using Bayesian methods. We examine the forecasting performance of the model for four U.S. macroeconomic variables between 1975:Q1 and 2011:Q2 using quarterly real-time data. We find that the point and density forecasts from the copula model are competitive with those from a Bayesian VAR. During the recent recession the forecast densities exhibit substantial asymmetry, avoiding some of the pitfalls of the symmetric forecast densities from the Bayesian VAR. We show that the asymmetries in the predictive distributions of GDP growth and inflation are similar to those found in the probabilistic forecasts from the Survey of Professional Forecasters. Last, we find that unlike the linear VAR model, our fitted Gaussian copula models exhibit nonlinear dependencies between some macroeconomic variables. This article has online supplementary material. 相似文献

19.

On the Correlation Structure of Gaussian Copula Models for Geostatistical Count Data

下载免费PDF全文

Zifei Han Victor De Oliveira 《Australian & New Zealand Journal of Statistics》2016,58(1):47-69

We describe a class of random field models for geostatistical count data based on Gaussian copulas. Unlike hierarchical Poisson models often used to describe this type of data, Gaussian copula models allow a more direct modelling of the marginal distributions and association structure of the count data. We study in detail the correlation structure of these random fields when the family of marginal distributions is either negative binomial or zero‐inflated Poisson; these represent two types of overdispersion often encountered in geostatistical count data. We also contrast the correlation structure of one of these Gaussian copula models with that of a hierarchical Poisson model having the same family of marginal distributions, and show that the former is more flexible than the latter in terms of range of feasible correlation, sensitivity to the mean function and modelling of isotropy. An exploratory analysis of a dataset of Japanese beetle larvae counts illustrate some of the findings. All of these investigations show that Gaussian copula models are useful alternatives to hierarchical Poisson models, specially for geostatistical count data that display substantial correlation and small overdispersion. 相似文献

20.

An R package for the simulation of correlated discrete variables

Alessandro Barbiero Pier Alda Ferrari 《统计学通讯:模拟与计算》2017,46(7):5123-5140

A package for the stochastic simulation of discrete variables with assigned marginal distributions and correlation matrix is presented and discussed. The simulating mechanism relies upon the Gaussian copula, linking the discrete distributions together, and an iterative scheme recovering the correlation matrix for the copula that ensures the desired correlations among the discrete variables. Examples of its use are provided as well as three possible applications (related to probability, sampling, and inference), which illustrate the utility of the package as an efficient and easy-to-use tool both in statistical research and for didactic purposes. 相似文献