期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A hierarchical Bayesian model for binary data incorporating selection bias

Seongmi Choi Balgobin Nandram 《统计学通讯:模拟与计算》2017,46(6):4767-4782

We consider a Bayesian nonignorable model to accommodate a nonignorable selection mechanism for predicting small area proportions. Our main objective is to extend a model on selection bias in a previously published paper, coauthored by four authors, to accommodate small areas. These authors assume that the survey weights (or their reciprocals that we also call selection probabilities) are available, but there is no simple relation between the binary responses and the selection probabilities. To capture the nonignorable selection bias within each area, they assume that the binary responses and the selection probabilities are correlated. To accommodate the small areas, we extend their model to a hierarchical Bayesian nonignorable model and we use Markov chain Monte Carlo methods to fit it. We illustrate our methodology using a numerical example obtained from data on activity limitation in the U.S. National Health Interview Survey. We also perform a simulation study to assess the effect of the correlation between the binary responses and the selection probabilities. 相似文献

2.

Weighting for unequal selection probabilities in multilevel models

D. Pfeffermann C. J. Skinner D. J. Holmes H. Goldstein & J. Rasbash 《Journal of the Royal Statistical Society. Series B, Statistical methodology》1998,60(1):23-40

When multilevel models are estimated from survey data derived using multistage sampling, unequal selection probabilities at any stage of sampling may induce bias in standard estimators, unless the sources of the unequal probabilities are fully controlled for in the covariates. This paper proposes alternative ways of weighting the estimation of a two-level model by using the reciprocals of the selection probabilities at each stage of sampling. Consistent estimators are obtained when both the sample number of level 2 units and the sample number of level 1 units within sampled level 2 units increase. Scaling of the weights is proposed to improve the properties of the estimators and to simplify computation. Variance estimators are also proposed. In a limited simulation study the scaled weighted estimators are found to perform well, although non-negligible bias starts to arise for informative designs when the sample number of level 1 units becomes small. The variance estimators perform extremely well. The procedures are illustrated using data from the survey of psychiatric morbidity. 相似文献

3.

Fitting Variance Components Model and Fixed Effects Model for One-Way Analysis of Variance to Complex Survey Data

《统计学通讯:理论与方法》2012,41(16-17):3278-3300

Under complex survey sampling, in particular when selection probabilities depend on the response variable (informative sampling), the sample and population distributions are different, possibly resulting in selection bias. This article is concerned with this problem by fitting two statistical models, namely: the variance components model (a two-stage model) and the fixed effects model (a single-stage model) for one-way analysis of variance, under complex survey design, for example, two-stage sampling, stratification, and unequal probability of selection, etc. Classical theory underlying the use of the two-stage model involves simple random sampling for each of the two stages. In such cases the model in the sample, after sample selection, is the same as model for the population; before sample selection. When the selection probabilities are related to the values of the response variable, standard estimates of the population model parameters may be severely biased, leading possibly to false inference. The idea behind the approach is to extract the model holding for the sample data as a function of the model in the population and of the first order inclusion probabilities. And then fit the sample model, using analysis of variance, maximum likelihood, and pseudo maximum likelihood methods of estimation. The main feature of the proposed techniques is related to their behavior in terms of the informativeness parameter. We also show that the use of the population model that ignores the informative sampling design, yields biased model fitting. 相似文献

4.

The Use of Sample Weights in Hot Deck Imputation

Andridge RR Little RJ 《Journal of official statistics》2009,25(1):21-36

A common strategy for handling item nonresponse in survey sampling is hot deck imputation, where each missing value is replaced with an observed response from a "similar" unit. We discuss here the use of sampling weights in the hot deck. The naive approach is to ignore sample weights in creation of adjustment cells, which effectively imputes the unweighted sample distribution of respondents in an adjustment cell, potentially causing bias. Alternative approaches have been proposed that use weights in the imputation by incorporating them into the probabilities of selection for each donor. We show by simulation that these weighted hot decks do not correct for bias when the outcome is related to the sampling weight and the response propensity. The correct approach is to use the sampling weight as a stratifying variable alongside additional adjustment variables when forming adjustment cells. 相似文献

5.

Refusal to Answer Specific Questions in a Survey: A Case Study

Lucio Bertoli-Barsotti Antonio Punzo 《统计学通讯:理论与方法》2014,43(4):826-838

It is well known that non ignorable item non response may occur when the cause of the non response is the value of the latent variable of interest. In these cases, a refusal by a respondent to answer specific questions in a survey should be treated sometimes as a non ignorable item non response. The Rasch-Rasch model (RRM) is a new two-dimensional item response theory model for addressing non ignorable non response. This article demonstrates the use of the RRM on data from an Italian survey focused on assessment of healthcare workers’ knowledge about sudden infant death syndrome (that is, a context in which non response is presumed to be more likely among individuals with a low level of competence). We compare the performance of the RRM with other models within the Rasch model family that assume the unidimensionality of the latent trait. We conclude that this assumption should be considered unreliable for the data at hand, whereas the RRM provides a better fit of the data. 相似文献

6.

Unconditional empirical likelihood approach for analytic use of public survey data

Yves G. Berger 《Scandinavian Journal of Statistics》2023,50(1):383-410

Modeling survey data often requires having the knowledge of design and weighting variables. With public-use survey data, some of these variables may not be available for confidentiality reasons. The proposed approach can be used in this situation, as long as calibrated weights and variables specifying the strata and primary sampling units are available. It gives consistent point estimation and a pivotal statistics for testing and confidence intervals. The proposed approach does not rely on with-replacement sampling, single-stage, negligible sampling fractions, or noninformative sampling. Adjustments based on design effects, eigenvalues, joint-inclusion probabilities or bootstrap, are not needed. The inclusion probabilities and auxiliary variables do not have to be known. Multistage designs with unequal selection of primary sampling units are considered. Nonresponse can be easily accommodated if the calibrated weights include reweighting adjustment for nonresponse. We use an unconditional approach, where the variables and sample are random variables. The design can be informative. 相似文献

7.

PARAMETRIC FRACTIONAL IMPUTATION FOR NON‐IGNORABLE CATEGORICAL MISSING DATA WITH FOLLOW‐UP

Ji Young Kim 《Australian & New Zealand Journal of Statistics》2012,54(2):239-250

Incomplete data subject to non‐ignorable non‐response are often encountered in practice and have a non‐identifiability problem. A follow‐up sample is randomly selected from the set of non‐respondents to avoid the non‐identifiability problem and get complete responses. Glynn, Laird, & Rubin analyzed non‐ignorable missing data with a follow‐up sample under a pattern mixture model. In this article, maximum likelihood estimation of parameters of the categorical missing data is considered with a follow‐up sample under a selection model. To estimate the parameters with non‐ignorable missing data, the EM algorithm with weighting, proposed by Ibrahim, is used. That is, in the E‐step, the weighted mean is calculated using the fractional weights for imputed data. Variances are estimated using the approximated jacknife method. Simulation results are presented to compare the proposed method with previously presented methods. 相似文献

8.

在不可忽略抽样机制下目标变量的估计问题

吕萍《统计与信息论坛》2008,23(10):9-13

抽样调查是通过对有限总体的重复抽样,用样本数据对总体的目标变量进行估计,但是若样本的抽样过程与目标变量有关,则样本分布不能代表总体分布,此时用样本数据来估计总体会产生很大的偏差。针对这种在不可忽略的抽样机制下如何进行目标变量的估计问题展开讨论,详细介绍了三种处理该问题的方法并对这三种方法进行了比较,得出第三种概率密度函数的方法是处理该问题比较好的一种方法。相似文献

9.

Logistic regression analysis of randomized response data with missing covariates

S.H. Hsieh S.M. Lee P.S. Shen 《Journal of statistical planning and inference》2010

Randomized response is an interview technique designed to eliminate response bias when sensitive questions are asked. In this paper, we present a logistic regression model on randomized response data when the covariates on some subjects are missing at random. In particular, we propose Horvitz and Thompson (1952)-type weighted estimators by using different estimates of the selection probabilities. We present large sample theory for the proposed estimators and show that they are more efficient than the estimator using the true selection probabilities. Simulation results support theoretical analysis. We also illustrate the approach using data from a survey of cable TV. 相似文献

10.

Improved Sampling-Importance Resampling and Reduced Bias Importance Sampling

Øivind Skare Erik Bølviken Lars Holden 《Scandinavian Journal of Statistics》2003,30(4):719-737

Abstract. The sampling-importance resampling (SIR) algorithm aims at drawing a random sample from a target distribution π. First, a sample is drawn from a proposal distribution q , and then from this a smaller sample is drawn with sample probabilities proportional to the importance ratios π/ q . We propose here a simple adjustment of the sample probabilities and show that this gives faster convergence. The results indicate that our version converges better also for small sample sizes. The SIR algorithms are compared with the Metropolis–Hastings (MH) algorithm with independent proposals. Although MH converges asymptotically faster, the results indicate that our improved SIR version is better than MH for small sample sizes. We also establish a connection between the SIR algorithms and importance sampling with normalized weights. We show that the use of adjusted SIR sample probabilities as importance weights reduces the bias of the importance sampling estimate. 相似文献

11.

Treatment design selection effects on parameter estimation in dynamic logistic models for longitudinal binary data

《Journal of Statistical Computation and Simulation》2012,82(9):1053-1067

In a longitudinal set-up, to examine the effects of certain fixed covariates on the repeated binary responses, there exists an approach to model the binary probabilities through a dynamic logistic relationship. In some practical situations such as in longitudinal clinical studies, it may happen that some of the covariates such as treatments are selected randomly following an adaptive design, whereas the rest of the covariates may be fixed by nature. The purpose of this study is to examine the effects of the design weights selection on the parameter estimation including the treatment effects, after taking the longitudinal correlations of the repeated binary responses into account. 相似文献

12.

Estimation of Causal Effects in Latent Strata with an Encouragement for Response

《统计学通讯:理论与方法》2012,41(16-17):3150-3161

We consider a new approach to deal with non ignorable non response on an outcome variable, in a causal inference framework. Assuming that a binary instrumental variable for non response is available, we provide a likelihood-based approach to identify and estimate heterogeneous causal effects of a binary treatment on specific latent subgroups of units, named principal strata, defined by the non response behavior under each level of the treatment and of the instrument. We show that, within each stratum, non response is ignorable and respondents can be properly compared by treatment status. In order to assess our method and its robustness when the usually invoked assumptions are relaxed or misspecified, we simulate data to resemble a real experiment conducted on a panel survey which compares different methods of reducing panel attrition. 相似文献

13.

Multilevel modelling of survey data: impact of the two-level weights used in the pseudolikelihood

Jean-Paul Lucas Véronique Sébille Alain Le Tertre Yann Le Strat Lise Bellanger 《Journal of applied statistics》2014,41(4):716-732

Approaches that use the pseudolikelihood to perform multilevel modelling on survey data have been presented in the literature. To avoid biased estimates due to unequal selection probabilities, conditional weights can be introduced at each level. Less-biased estimators can also be obtained in a two-level linear model if the level-1 weights are scaled. In this paper, we studied several level-2 weights that can be introduced into the pseudolikelihood when the sampling design and the hierarchical structure of the multilevel model do not match. Two-level and three-level models were studied. The present work was motivated by a study that aims to estimate the contributions of lead sources to polluting the interior floor dust of the rooms within dwellings. We performed a simulation study using the real data collected from a French survey to achieve our objective. We conclude that it is preferable to use unweighted analyses or, at the most, to use conditional level-2 weights in a two-level or a three-level model. We state some warnings and make some recommendations. 相似文献

14.

Evaluating bias correction in weighted proportional hazards regression

Pan Q Schaubel DE 《Lifetime data analysis》2009,15(1):120-146

Often in observational studies of time to an event, the study population is a biased (i.e., unrepresentative) sample of the target population. In the presence of biased samples, it is common to weight subjects by the inverse of their respective selection probabilities. Pan and Schaubel (Can J Stat 36:111–127, 2008) recently proposed inference procedures for an inverse selection probability weighted (ISPW) Cox model, applicable when selection probabilities are not treated as fixed but estimated empirically. The proposed weighting procedure requires auxiliary data to estimate the weights and is computationally more intense than unweighted estimation. The ignorability of sample selection process in terms of parameter estimators and predictions is often of interest, from several perspectives: e.g., to determine if weighting makes a significant difference to the analysis at hand, which would in turn address whether the collection of auxiliary data is required in future studies; to evaluate previous studies which did not correct for selection bias. In this article, we propose methods to quantify the degree of bias corrected by the weighting procedure in the partial likelihood and Breslow-Aalen estimators. Asymptotic properties of the proposed test statistics are derived. The finite-sample significance level and power are evaluated through simulation. The proposed methods are then applied to data from a national organ failure registry to evaluate the bias in a post-kidney transplant survival model. 相似文献

15.

A class of simple approximate sequential tests for adaptive comparison of two treatments

Lakhbir S. Hayre Bruce W. Turnbull 《统计学通讯:理论与方法》2013,42(22):2339-2360

We consider the problem of sequentially deciding which of two treatments is superior, A class of simple approximate sequential tests is proposed. These have the probabilities of correct selection approximately independent of the sampling rule and depending on unknown parameters only through the function of interest, such as the difference or ratio of mean responses. The tests are obtained by using a normal approximation, and this is employed to derive approximate expressions for the probabilities of correct selection and the expected sample sizes. A class of data-dependent sampling rules is proposed for minimizing any weighted average of the expected sample sizes on the two treatments, with the weights being allowed to depend on unknown parameters. The tests are studied in the particular cases of exponentially. 相似文献

16.

Estimating household structure in ancient China by using historical data: a latent class analysis of partially missing patterns

Tim Futing Liao 《Journal of the Royal Statistical Society. Series A, (Statistics in Society)》2004,167(1):125-139

Summary. Social data often contain missing information. The problem is inevitably severe when analysing historical data. Conventionally, researchers analyse complete records only. Listwise deletion not only reduces the effective sample size but also may result in biased estimation, depending on the missingness mechanism. We analyse household types by using population registers from ancient China (618–907 AD) by comparing a simple classification, a latent class model of the complete data and a latent class model of the complete and partially missing data assuming four types of ignorable and non-ignorable missingness mechanisms. The findings show that either a frequency classification or a latent class analysis using the complete records only yielded biased estimates and incorrect conclusions in the presence of partially missing data of a non-ignorable mechanism. Although simply assuming ignorable or non-ignorable missing data produced consistently similarly higher estimates of the proportion of complex households, a specification of the relationship between the latent variable and the degree of missingness by a row effect uniform association model helped to capture the missingness mechanism better and improved the model fit. 相似文献

17.

Weighted composite quantile regression analysis for nonignorable missing data using nonresponse instrument

Puying Zhao Hui Zhao Niansheng Tang Zhaohai Li 《Journal of nonparametric statistics》2017,29(2):189-212

Efficient statistical inference on nonignorable missing data is a challenging problem. This paper proposes a new estimation procedure based on composite quantile regression (CQR) for linear regression models with nonignorable missing data, that is applicable even with high-dimensional covariates. A parametric model is assumed for modelling response probability, which is estimated by the empirical likelihood approach. Local identifiability of the proposed strategy is guaranteed on the basis of an instrumental variable approach. A set of data-based adaptive weights constructed via an empirical likelihood method is used to weight CQR functions. The proposed method is resistant to heavy-tailed errors or outliers in the response. An adaptive penalisation method for variable selection is proposed to achieve sparsity with high-dimensional covariates. Limiting distributions of the proposed estimators are derived. Simulation studies are conducted to investigate the finite sample performance of the proposed methodologies. An application to the ACTG 175 data is analysed. 相似文献

18.

A semiparametric Bayesian model for repeatedly repeated binary outcomes

Fernando A. Quintana Peter Müller Gary L. Rosner Mary V. Relling 《Journal of the Royal Statistical Society. Series C, Applied statistics》2008,57(4):419-431

Summary. We discuss the analysis of data from single-nucleotide polymorphism arrays comparing tumour and normal tissues. The data consist of sequences of indicators for loss of heterozygosity (LOH) and involve three nested levels of repetition: chromosomes for a given patient, regions within chromosomes and single-nucleotide polymorphisms nested within regions. We propose to analyse these data by using a semiparametric model for multilevel repeated binary data. At the top level of the hierarchy we assume a sampling model for the observed binary LOH sequences that arises from a partial exchangeability argument. This implies a mixture of Markov chains model. The mixture is defined with respect to the Markov transition probabilities. We assume a non-parametric prior for the random-mixing measure. The resulting model takes the form of a semiparametric random-effects model with the matrix of transition probabilities being the random effects. The model includes appropriate dependence assumptions for the two remaining levels of the hierarchy, i.e. for regions within chromosomes and for chromosomes within patient. We use the model to identify regions of increased LOH in a data set coming from a study of treatment-related leukaemia in children with an initial cancer diagnostic. The model successfully identifies the desired regions and performs well compared with other available alternatives. 相似文献

19.

Bayesian weighted inference from surveys

David Gunawan Anastasios Panagiotelis William Griffiths Duangkamon Chotikapanich 《Australian & New Zealand Journal of Statistics》2020,62(1):71-94

Data from large surveys are often supplemented with sampling weights that are designed to reflect unequal probabilities of response and selection inherent in complex survey sampling methods. We propose two methods for Bayesian estimation of parametric models in a setting where the survey data and the weights are available, but where information on how the weights were constructed is unavailable. The first approach is to simply replace the likelihood with the pseudo likelihood in the formulation of Bayes theorem. This is proven to lead to a consistent estimator but also leads to credible intervals that suffer from systematic undercoverage. Our second approach involves using the weights to generate a representative sample which is integrated into a Markov chain Monte Carlo (MCMC) or other simulation algorithms designed to estimate the parameters of the model. In the extensive simulation studies, the latter methodology is shown to achieve performance comparable to the standard frequentist solution of pseudo maximum likelihood, with the added advantage of being applicable to models that require inference via MCMC. The methodology is demonstrated further by fitting a mixture of gamma densities to a sample of Australian household income. 相似文献

20.

ESTIMATION PROCEDURES FOR CATEGORICAL SURVEY DATA WITH NONIGNORABLE NONRESPONSE

《统计学通讯:理论与方法》2013,42(4):643-663

We consider surveys with one or more callbacks and use a series of logistic regressions to model the probabilities of nonresponse at first contact and subsequent callbacks. These probabilities are allowed to depend on covariates as well as the categorical variable of interest and so the nonresponse mechanism is nonignorable. Explicit formulae for the score functions and information matrices are given for some important special cases to facilitate implementation of the method of scoring for obtaining maximum likelihood estimates of the model parameters. For estimating finite population quantities, we suggest the imputation and prediction approaches as alternatives to weighting adjustment. Simulation results suggest that the proposed methods work well in reducing the bias due to nonresponse. In our study, the imputation and prediction approaches perform better than weighting adjustment and they continue to perform quite well in simulations involving misspecified response models. 相似文献