期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Strategies for handling missing data in longitudinal studies with questionnaires

Nazanin Nooraee Geert Molenberghs Johan Ormel 《Journal of Statistical Computation and Simulation》2018,88(17):3415-3436

Missing data methods, maximum likelihood estimation (MLE) and multiple imputation (MI), for longitudinal questionnaire data were investigated via simulation. Predictive mean matching (PMM) was applied at both item and scale levels, logistic regression at item level and multivariate normal imputation at scale level. We investigated a hybrid approach which is combination of MLE and MI, i.e. scales from the imputed data are eliminated if all underlying items were originally missing. Bias and mean square error (MSE) for parameter estimates were examined. ML seemed to provide occasionally the best results in terms of bias, but hardly ever on MSE. All imputation methods at the scale level and logistic regression at item level hardly ever showed the best performance. The hybrid approach is similar or better than its original MI. The PMM-hybrid approach at item level demonstrated the best MSE for most settings and in some cases also the smallest bias. 相似文献

2.

Estimating Transition Probabilities for Ignorable Intermittent Missing Data in a Discrete-Time Markov Chain

Hung-Wen Yeh Wenyaw Chan Elaine Symanski Barry R. Davis 《统计学通讯:模拟与计算》2013,42(2):433-448

This article considers a discrete-time Markov chain for modeling transition probabilities when multiple successive observations are missing at random between two observed outcomes using three methods: a na\"?ve analog of complete-case analysis using the observed one-step transitions alone, a non data-augmentation method (NL) by solving nonlinear equations, and a data-augmentation method, the Expectation-Maximization (EM) algorithm. The explicit form of the conditional log-likelihood given the observed information as required by the E step is provided, and the iterative formula in the M step is expressed in a closed form. An empirical study was performed to examine the accuracy and precision of the estimates obtained in the three methods under ignorable missing mechanisms of missing completely at random and missing at random. A dataset from the mental health arena was used for illustration. It was found that both data-augmentation and nonaugmentation methods provide accurate and precise point estimation, and that the na\"?ve method resulted in estimates of the transition probabilities with similar bias but larger MSE. The NL method and the EM algorithm in general provide similar results whereas the latter provides conditional expected row margins leading to smaller standard errors. 相似文献

3.

Probability density estimation with data missing at random when covariables are present

Qihua Wang 《Journal of statistical planning and inference》2008

This paper addresses the problem of the probability density estimation in the presence of covariates when data are missing at random (MAR). The inverse probability weighted method is used to define a nonparametric and a semiparametric weighted probability density estimators. A regression calibration technique is also used to define an imputed estimator. It is shown that all the estimators are asymptotically normal with the same asymptotic variance as that of the inverse probability weighted estimator with known selection probability function and weights. Also, we establish the mean squared error (MSE) bounds and obtain the MSE convergence rates. A simulation is carried out to assess the proposed estimators in terms of the bias and standard error. 相似文献

4.

Generalized least squares estimation of multivariate nonlinear models with missing data

Howard R. Siepman Shie-Shien Yang 《统计学通讯:理论与方法》2013,42(6):1565-1579

The method of estimated generalized least squares estimation of multiple response models is extended to the randomly missing date case. This estimation procedure is computationally simply when there are many missing data but the number of distinct patterns of missing data for the response vectors is small. The consistency and asymptotic normality of the proposed estimators are established. 相似文献

5.

A Bayesian approach with generalized ridge estimation for high-dimensional regression and testing

Szu-Peng Yang 《统计学通讯:模拟与计算》2017,46(8):6083-6105

This paper adopts a Bayesian strategy for generalized ridge estimation for high-dimensional regression. We also consider significance testing based on the proposed estimator, which is useful for selecting regressors. Both theoretical and simulation studies show that the proposed estimator can simultaneously outperform the ordinary ridge estimator and the LSE in terms of the mean square error (MSE) criterion. The simulation study also demonstrates the competitive MSE performance of our proposal with the Lasso under sparse models. We demonstrate the method using the lung cancer data involving high-dimensional microarrays. 相似文献

6.

Variable selection for high-dimensional generalized linear model with block-missing data

Yifan He Yang Feng Xinyuan Song 《Scandinavian Journal of Statistics》2023,50(3):1279-1297

In modern scientific research, multiblock missing data emerges with synthesizing information across multiple studies. However, existing imputation methods for handling block-wise missing data either focus on the single-block missing pattern or heavily rely on the model structure. In this study, we propose a single regression-based imputation algorithm for multiblock missing data. First, we conduct a sparse precision matrix estimation based on the structure of block-wise missing data. Second, we impute the missing blocks with their means conditional on the observed blocks. Theoretical results about variable selection and estimation consistency are established in the context of a generalized linear model. Moreover, simulation studies show that compared with existing methods, the proposed imputation procedure is robust to various missing mechanisms because of the good properties of regression imputation. An application to Alzheimer's Disease Neuroimaging Initiative data also confirms the superiority of our proposed method. 相似文献

7.

Using data augmentation to correct for non-ignorable non-response when surrogate data are available: an application to the distribution of hourly pay

Gabriele B. Durrant Chris Skinner 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(3):605-623

Summary. The paper develops a data augmentation method to estimate the distribution function of a variable, which is partially observed, under a non-ignorable missing data mechanism, and where surrogate data are available. An application to the estimation of hourly pay distributions using UK Labour Force Survey data provides the main motivation. In addition to considering a standard parametric data augmentation method, we consider the use of hot deck imputation methods as part of the data augmentation procedure to improve the robustness of the method. The method proposed is compared with standard methods that are based on an ignorable missing data mechanism, both in a simulation study and in the Labour Force Survey application. The focus is on reducing bias in point estimation, but variance estimation using multiple imputation is also considered briefly. 相似文献

8.

Small area estimation of proportions in business surveys

《Journal of Statistical Computation and Simulation》2012,82(6):783-795

Binary data are often of interest in business surveys, particularly when the aim is to characterize grouping in the businesses making up the survey population. When small area estimates are required for such binary data, use of standard estimation methods based on linear mixed models (LMMs) becomes problematic. We explore two model-based techniques of small area estimation for small area proportions, the empirical best predictor (EBP) under a generalized linear mixed model and the model-based direct estimator (MBDE) under a population-level LMM. Our empirical results show that both the MBDE and the EBP perform well. The EBP is a computationally intensive method, whereas the MBDE is easy to implement. In case of model misspecification, the MBDE also appears to be more robust. The mean-squared error (MSE) estimation of MBDE is simple and straightforward, which is in contrast to the complicated MSE estimation for the EBP. 相似文献

9.

Cure rate survival models with missing covariates: a simulation study

Renata Santana Fonseca Heleno Bolfarine 《Journal of Statistical Computation and Simulation》2013,83(1):97-113

In this paper we study the cure rate survival model involving a competitive risk structure with missing categorical covariates. A parametric distribution that can be written as a sequence of one-dimensional conditional distributions is specified for the missing covariates. We consider the missing data at random situation so that the missing covariates may depend only on the observed ones. Parameter estimates are obtained by using the EM algorithm via the method of weights. Extensive simulation studies are conducted and reported to compare estimates efficiency with and without missing data. As expected, the estimation approach taking into consideration the missing covariates presents much better efficiency in terms of mean square errors than the complete case situation. Effects of increasing cured fraction and censored observations are also reported. We demonstrate the proposed methodology with two real data sets. One involved the length of time to obtain a BS degree in Statistics, and another about the time to breast cancer recurrence. 相似文献

10.

Tests of independence in incomplete multi-way tables using likelihood functions

Shin-Soo Kang Michael D. Larsen 《Journal of the Korean Statistical Society》2012,41(2):189-198

Kang (2006) and Kang and Larsen (in press) used the log likelihood function with Lagrangian multipliers for estimation of cell probabilities in two-way incomplete contingency tables. This paper extends results and simulations to three-way and multi-way tables. Numerous studies cross-classify subjects by three or more categorical factors. Constraints on cell probabilities are incorporated through Lagrangian multipliers. Variances of the MLEs are derived from the matrix of second derivatives of the log likelihood with respect to cell probabilities and the Lagrange multiplier. Wald and likelihood ratio tests of independence are derived using the estimates and estimated variances. In simulation results in Kang and Larsen (in press), for data missing at random, maximum likelihood estimation (MLE) produced more efficient estimates of population proportions than either multiple imputation (MI) based on data augmentation or complete case (CC) analysis. Neither MLE nor MI, however, lead to an improvement over CC analysis with respect to power of tests for independence in two-way tables. Results are extended to multidimensional tables with arbitrary patterns of missing data when the variables are recorded on individual subjects. In three-way and higher-way tables, however, there is information relevant for judging independence in partially classified information, as long as two or more variables are jointly observed. Simulations study three-dimensional tables with three patterns of association and two levels of missing information. 相似文献

11.

Small-area estimation by combining time-series and cross-sectional data

J. N. K. Rao Mingyu Yu 《Revue canadienne de statistique》1994,22(4):511-528

A model involving autocorrelated random effects and sampling errors is proposed for small-area estimation, using both time-series and cross-sectional data. The sampling errors are assumed to have a known block-diagonal covariance matrix. This model is an extension of a well-known model, due to Fay and Herriot (1979), for cross-sectional data. A two-stage estimator of a small-area mean for the current period is obtained under the proposed model with known autocorrelation, by first deriving the best linear unbiased prediction estimator assuming known variance components, and then replacing them with their consistent estimators. Extending the approach of Prasad and Rao (1986, 1990) for the Fay-Herriot model, an estimator of mean squared error (MSE) of the two-stage estimator, correct to a second-order approximation for a small or moderate number of time points, T, and a large number of small areas, m, is obtained. The case of unknown autocorrelation is also considered. Limited simulation results on the efficiency of two-stage estimators and the accuracy of the proposed estimator of MSE are présentés. 相似文献

12.

Patterns of consent: evidence from a general household survey 总被引：1，自引：0，他引：1

Stephen P. Jenkins Lorenzo Cappellari Peter Lynn Annette Jäckle Emanuela Sala 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2006,169(4):701-722

Summary. We analyse patterns of consent and consent bias in the context of a large general household survey, the 'Improving survey measurement of income and employment' survey, also addressing issues that arise when there are multiple consent questions. A multivariate probit regression model for four binary outcomes with two incidental truncations is used. We show that there are biases in consent to data linkage with benefit and tax credit administrative records that are held by the Department for Work and Pensions, and with wage and employment data held by employers. There are also biases in respondents' willingness and ability to supply their national insurance number. The biases differ according to the question that is considered. We also show that modelling questions on consent independently rather than jointly may lead to misleading inferences about consent bias. A positive correlation between unobservable individual factors affecting consent to Department for Work and Pensions record linkage and consent to employer record linkage is suggestive of a latent individual consent propensity. 相似文献

13.

Structurally missing data problems in multiple list capture–recapture data

Peter G. M. van der Heijden Eugene Zwane David Hessen 《AStA Advances in Statistical Analysis》2009,93(1):5-21

相似文献

14.

行政记录整合的贝叶斯分层记录链接模型及应用

丁东洋周丽莉《统计与信息论坛》2016,(7):30-35

记录链接的技术问题与统计理论密切相关,尤其是在建立记录链接分类规则时需要构建统计模型,识别关键变量以完成数据匹配。在贝叶斯框架下构建分层模型整合行政记录,通过多元回归可以实现匹配错误率的估计,而且一对一限制下的记录链接允许通过模块反映记录信息的来源变化,基于MCMC模拟的后验分布计算方便,有助于提高数据整合效率。相似文献

15.

Semiparametric estimation for weighted average derivatives with responses missing at random

Wanrong LiuXuewen Lu Changchun Xie 《Journal of statistical planning and inference》2012,142(1):347-357

When responses are missing at random, we propose a semiparametric direct estimator for the missing probability and density-weighted average derivatives of a general nonparametric multiple regression function. An estimator for the normalized version of the weighted average derivatives is constructed as well using instrumental variables regression. The proposed estimators are computationally simple and asymptotically normal, and provide a solution to the problem of estimating index coefficients of single-index models with responses missing at random. The developed theory generalizes the method of the density-weighted average derivatives estimation of Powell et al. (1989) for the non-missing data case. Monte Carlo simulation studies are conducted to study the performance of the methods. 相似文献

16.

Fill-in of missing data in univariate coastal data

Todd L. Walton 《Journal of applied statistics》1996,23(1):31-40

An approach to fill-in of missing data where gaps exist within an otherwise continuous record is addressed. Ad hoc methods of past approaches are discussed and limitations noted. An approach for providing an estimate of data filling consistent with data having a correlation structure is presented. The method provided is an extension of parametric modelling along with additional constraints imposed via a linear filter to account for variance preservation. The method is applied and compared to real data in which a portion of the record has been removed to simulate missing data. Results show the method to provide realistic missing data that preserves the correlation structure and variance of the measured data. 相似文献

17.

Fractional Regression Hot Deck Imputation Weight Adjustment

Minhui Paik Michael D. Larsen 《统计学通讯:模拟与计算》2013,42(7):1514-1532

Fractional regression hot deck imputation (FRHDI) imputes multiple values for each instance of a missing dependent variable. The imputed values are equal to the predicted value plus multiple random residuals. Fractional weights enable variance estimation and preserve correlations. In some circumstances with some starting weight values, existing procedures for computing FRHDI weights can produce negative values. We discuss procedures for constructing non-negative adjusted fractional weights for FRHDI and study performance of the algorithm using simulation. The algorithm can be used effectively with FRDHI procedures for handling missing data in the context of a complex sample survey. 相似文献

18.

Combining Inverse Probability Weighting and Multiple Imputation to Improve Robustness of Estimation

下载免费PDF全文

Peisong Han 《Scandinavian Journal of Statistics》2016,43(1):246-260

Inverse probability weighting (IPW) and multiple imputation are two widely adopted approaches dealing with missing data. The former models the selection probability, and the latter models data distribution. Consistent estimation requires correct specification of corresponding models. Although the augmented IPW method provides an extra layer of protection on consistency, it is usually not sufficient in practice as the true data‐generating process is unknown. This paper proposes a method combining the two approaches in the same spirit of calibration in sampling survey literature. Multiple models for both the selection probability and data distribution can be simultaneously accounted for, and the resulting estimator is consistent if any model is correctly specified. The proposed method is within the framework of estimating equations and is general enough to cover regression analysis with missing outcomes and/or missing covariates. Results on both theoretical and numerical investigation are provided. 相似文献

19.

缺失数据下的逆概率多重加权分位回归估计及其应用

邰凌楠等《统计研究》2018,35(9):115-128

数据缺失问题普遍存在于应用研究中。在随机缺失机制假定下,本文从模型推断角度出发,针对线性缺失分位回归模型,提出一种新的有效估计方法——逆概率多重加权（IPMW）估计。该方法是在逆概率加权（IPW）估计的基础上,结合倾向得分匹配及模型平均思想,经过多次估计,加权确定最终参数估计结果。该方法适用于响应变量是独立同分布或独立非同分布的情形,并适用于绝大多数缺失场景。经过理论推导及模拟研究发现,IPMW估计量在继承IPW估计量的优势上具有更稳健的性质。最后,将该方法应用于含有缺失数据的微观调查数据中,研究了经济较发达的准一线城市中等收入群体消费水平的影响因素,对比两种估计方法的估计结果及置信带,发现逆概率多重加权估计量的标准偏差更小,估计结果更稳健。相似文献

20.

Linear Regression With Nested Errors Using Probability‐Linked Data

Klairung Samart Ray Chambers 《Australian & New Zealand Journal of Statistics》2014,56(1):27-46

Probabilistic matching of records is widely used to create linked data sets for use in health science, epidemiological, economic, demographic and sociological research. Clearly, this type of matching can lead to linkage errors, which in turn can lead to bias and increased variability when standard statistical estimation techniques are used with the linked data. In this paper we develop unbiased regression parameter estimates to be used when fitting a linear model with nested errors to probabilistically linked data. Since estimation of variance components is typically an important objective when fitting such a model, we also develop appropriate modifications to standard methods of variance components estimation in order to account for linkage error. In particular, we focus on three widely used methods of variance components estimation: analysis of variance, maximum likelihood and restricted maximum likelihood. Simulation results show that our estimators perform reasonably well when compared to standard estimation methods that ignore linkage errors. 相似文献