首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
We show how to infer about a finite population proportion using data from a possibly biased sample. In the absence of any selection bias or survey weights, a simple ignorable selection model, which assumes that the binary responses are independent and identically distributed Bernoulli random variables, is not unreasonable. However, this ignorable selection model is inappropriate when there is a selection bias in the sample. We assume that the survey weights (or their reciprocals which we call ‘selection’ probabilities) are available, but there is no simple relation between the binary responses and the selection probabilities. To capture the selection bias, we assume that there is some correlation between the binary responses and the selection probabilities (e.g., there may be a somewhat higher/lower proportion of positive responses among the sampled units than among the nonsampled units). We use a Bayesian nonignorable selection model to accommodate the selection mechanism. We use Markov chain Monte Carlo methods to fit the nonignorable selection model. We illustrate our method using numerical examples obtained from NHIS 1995 data.  相似文献   

2.
Summary.  In sample surveys of finite populations, subpopulations for which the sample size is too small for estimation of adequate precision are referred to as small domains. Demand for small domain estimates has been growing in recent years among users of survey data. We explore the possibility of enhancing the precision of domain estimators by combining comparable information collected in multiple surveys of the same population. For this, we propose a regression method of estimation that is essentially an extended calibration procedure whereby comparable domain estimates from the various surveys are calibrated to each other. We show through analytic results and an empirical study that this method may greatly improve the precision of domain estimators for the variables that are common to these surveys, as these estimators make effective use of increased sample size for the common survey items. The design-based direct estimators proposed involve only domain-specific data on the variables of interest. This is in contrast with small domain (mostly small area) indirect estimators, based on a single survey, which incorporate through modelling data that are external to the targeted small domains. The approach proposed is also highly effective in handling the closely related problem of estimation for rare population characteristics.  相似文献   

3.
In practical survey sampling, missing data are unavoidable due to nonresponse, rejected observations by editing, disclosure control, or outlier suppression. We propose a calibrated imputation approach so that valid point and variance estimates of the population (or domain) totals can be computed by the secondary users using simple complete‐sample formulae. This is especially helpful for variance estimation, which generally require additional information and tools that are unavailable to the secondary users. Our approach is natural for continuous variables, where the estimation may be either based on reweighting or imputation, including possibly their outlier‐robust extensions. We also propose a multivariate procedure to accommodate the estimation of the covariance matrix between estimated population totals, which facilitates variance estimation of the ratios or differences among the estimated totals. We illustrate the proposed approach using simulation data in supplementary materials that are available online.  相似文献   

4.
Summary.  Complex survey sampling is often used to sample a fraction of a large finite population. In general, the survey is conducted so that each unit (e.g. subject) in the sample has a different probability of being selected into the sample. For generalizability of the sample to the population, both the design and the probability of being selected into the sample must be incorporated in the analysis. In this paper we focus on non-standard regression models for complex survey data. In our motivating example, which is based on data from the Medical Expenditure Panel Survey, the outcome variable is the subject's 'total health care expenditures in the year 2002'. Previous analyses of medical cost data suggest that the variance is approximately equal to the mean raised to the power of 1.5, which is a non-standard variance function. Currently, the regression parameters for this model cannot be easily estimated in standard statistical software packages. We propose a simple two-step method to obtain consistent regression parameter and variance estimates; the method proposed can be implemented within any standard sample survey package. The approach is applicable to complex sample surveys with any number of stages.  相似文献   

5.
金勇进  张喆 《统计研究》2014,31(9):79-84
用样本数据推断总体,权数的作用十分重要。使用权数,不仅能将样本还原到总体,还能调整样本结构,使其与总体结构相一致,因此正确的使用权数是我们进行统计推断的基础。本文系统阐述了抽样调查分析中权数的获取过程,以及后期对初始权数调整过程。由于权数是把双刃剑,在提高精度的同时,有可能提高估计量的误差,本文提出了对权数进行评估的方法,研讨如何对权数进行控制,最后根据我国综合社会调查项目(CGSS)的数据进行实证分析,按照所给方法不仅能提高估计精度,而且能够降低抽样推断中的权效应。  相似文献   

6.
Adaptive cluster sampling is an efficient method of estimating the parameters of rare and clustered populations. The method mimics how biologists would like to collect data in the field by targeting survey effort to localised areas where the rare population occurs. Another popular sampling design is inverse sampling. Inverse sampling was developed so as to be able to obtain a sample of rare events having a predetermined size. Ideally, in inverse sampling, the resultant sample set will be sufficiently large to ensure reliable estimation of population parameters. In an effort to combine the good properties of these two designs, adaptive cluster sampling and inverse sampling, we introduce inverse adaptive cluster sampling with unequal selection probabilities. We develop an unbiased estimator of the population total that is applicable to data obtained from such designs. We also develop numerical approximations to this estimator. The efficiency of the estimators that we introduce is investigated through simulation studies based on two real populations: crabs in Al Khor, Qatar and arsenic pollution in Kurdistan, Iran. The simulation results show that our estimators are efficient.  相似文献   

7.
Summary.  The number of people to select within selected households has significant consequences for the conduct and output of household surveys. The operational and data quality implications of this choice are carefully considered in many surveys, but the effect on statistical efficiency is not well understood. The usual approach is to select all people in each selected household, where operational and data quality concerns make this feasible. If not, one person is usually selected from each selected household. We find that this strategy is not always justified, and we develop intermediate designs between these two extremes. Current practices were developed when household survey field procedures needed to be simple and robust; however, more complex designs are now feasible owing to the increasing use of computer-assisted interviewing. We develop more flexible designs by optimizing survey cost, based on a simple cost model, subject to a required variance for an estimator of population total. The innovation lies in the fact that household sample sizes are small integers, which creates challenges in both design and estimation. The new methods are evaluated empirically by using census and health survey data, showing considerable improvement over existing methods in some cases.  相似文献   

8.
Sampling has evolved into a universally accepted approach for gathering information and data mining as it is widely accepted that a reasonably modest-sized sample can sufficiently characterize a much larger population. In stratified sampling designs, the whole population is divided into homogeneous strata in order to achieve higher precision in the estimation. This paper proposes an efficient method of constructing optimum stratum boundaries (OSB) and determining optimum sample size (OSS) for the survey variable. The survey variable may not be available in practice since the variable of interest is unavailable prior to conducting the survey. Thus, the method is based on the auxiliary variable which is usually readily available from past surveys. To illustrate the application as an example using a real data, the auxiliary variable considered for this problem follows Weibull distribution. The stratification problem is formulated as a Mathematical Programming Problem (MPP) that seeks minimization of the variance of the estimated population parameter under Neyman allocation. The solution procedure employs the dynamic programming technique, which results in substantial gains in the precision of the estimates of the population characteristics.  相似文献   

9.
张勇 《统计研究》2007,24(11):69-73
中国在1984年确定了国家抽样调查县,这些调查县一直使用至今,已有20多年。利用这些调查县,中国进行了许多关于农业的调查,并且有一些原因一直没有改变这些调查县。近年来,有人对这些调查县的代表性提出疑问,是可以理解的。我们应该解释这些调查县能被保持不变的原因,并能找出改进调查的好办法。本文给出一种方法,就是使用调整系数来解决这些调查县的代表性问题,并且利用第一次中国农业普查的数据来对方法进行模拟,得到了较好的结果。我国2007年正在进行第二次全国农业普查,我们可以应用普查结果来完善抽样调查,提高国家抽样调查县的代表性。我们建议结合面积抽样框进行多样框抽样设计,并可以考虑以县作为中国农业调查的初级抽样单元。  相似文献   

10.
《统计学通讯:理论与方法》2012,41(16-17):3278-3300
Under complex survey sampling, in particular when selection probabilities depend on the response variable (informative sampling), the sample and population distributions are different, possibly resulting in selection bias. This article is concerned with this problem by fitting two statistical models, namely: the variance components model (a two-stage model) and the fixed effects model (a single-stage model) for one-way analysis of variance, under complex survey design, for example, two-stage sampling, stratification, and unequal probability of selection, etc. Classical theory underlying the use of the two-stage model involves simple random sampling for each of the two stages. In such cases the model in the sample, after sample selection, is the same as model for the population; before sample selection. When the selection probabilities are related to the values of the response variable, standard estimates of the population model parameters may be severely biased, leading possibly to false inference. The idea behind the approach is to extract the model holding for the sample data as a function of the model in the population and of the first order inclusion probabilities. And then fit the sample model, using analysis of variance, maximum likelihood, and pseudo maximum likelihood methods of estimation. The main feature of the proposed techniques is related to their behavior in terms of the informativeness parameter. We also show that the use of the population model that ignores the informative sampling design, yields biased model fitting.  相似文献   

11.
The purpose of this paper is to account for informative sampling in fitting time series models, and in particular an autoregressive model of order one, for longitudinal survey data. The idea behind the proposed approach is to extract the model holding for the sample data as a function of the model in the population and the first-order inclusion probabilities, and then fit the sample model using maximum-likelihood, pseudo-maximum-likelihood and estimating equations methods. A new test for sampling ignorability is proposed based on the Kullback–Leibler information measure. Also, we investigate the issue of the sensitivity of the sample model to incorrect specification of the conditional expectations of the sample inclusion probabilities. The simulation study carried out shows that the sample-likelihood-based method produces better estimators than the pseudo-maximum-likelihood method, and that sensitivity to departures from the assumed model is low. Also, we find that both the conventional t-statistic and the Kullback–Leibler information statistic for testing of sampling ignorability perform well under both informative and noninformative sampling designs.  相似文献   

12.
Abstract. A model‐based predictive estimator is proposed for the population proportions of a polychotomous response variable, based on a sample from the population and on auxiliary variables, whose values are known for the entire population. The responses for the non‐sample units are predicted using a multinomial logit model, which is a parametric function of the auxiliary variables. A bootstrap estimator is proposed for the variance of the predictive estimator, its consistency is proved and its small sample performance is compared with that of an analytical estimator. The proposed predictive estimator is compared with other available estimators, including model‐assisted ones, both in a simulation study involving different sampling designs and model mis‐specification, and using real data from an opinion survey. The results indicate that the prediction approach appears to use auxiliary information more efficiently than the model‐assisted approach.  相似文献   

13.
In non-experimental research, data on the same population process may be collected simultaneously by more than one instrument. For example, in the present application, two sample surveys and a population birth registration system all collect observations on first births by age and year, while the two surveys additionally collect information on women’s education. To make maximum use of the three data sources, the survey data are pooled and the population data introduced as constraints in a logistic regression equation. Reductions in standard errors about the age and birth-cohort parameters of the regression equation in the order of three-quarters are obtained by introducing the population data as constraints. A halving of the standard errors about the education parameters is achieved by pooling observations from the larger survey dataset with those from the smaller survey. The percentage reduction in the standard errors through imposing population constraints is independent of the total survey sample size.  相似文献   

14.
In sample surveys and many other areas of application, the ratio of variables is often of great importance. This often occurs when one variable is available at the population level while another variable of interest is available for sample data only. In this case, using the sample ratio, we can often gather valuable information on the variable of interest for the unsampled observations. In many other studies, the ratio itself is of interest, for example when estimating proportions from a random number of observations. In this note we compare three confidence intervals for the population ratio: A large sample interval, a log based version of the large sample interval, and Fieller’s interval. This is done through data analysis and through a small simulation experiment. The Fieller method has often been proposed as a superior interval for small sample sizes. We show through a data example and simulation experiments that Fieller’s method often gives nonsensical and uninformative intervals when the observations are noisy relative to the mean of the data. The large sample interval does not similarly suffer and thus can be a more reliable method for small and large samples.  相似文献   

15.
Unweighted estimators using data collected in a sample survey can be badly biased, whereas weighted estimators are approximately unbiased for population parameters. We present four examples using data from the 1988 National Maternal and Infant Health Survey to demonstrate that weighted and unweighted estimators can be quite different, and to show the underlying causes of such differences.  相似文献   

16.
Measurement error, the difference between a measured (observed) value of quantity and its true value, is perceived as a possible source of estimation bias in many surveys. To correct for such bias, a validation sample can be used in addition to the original sample for adjustment of measurement error. Depending on the type of validation sample, we can either use the internal calibration approach or the external calibration approach. Motivated by Korean Longitudinal Study of Aging (KLoSA), we propose a novel application of fractional imputation to correct for measurement error in the analysis of survey data. The proposed method is to create imputed values of the unobserved true variables, which are mis-measured in the main study, by using validation subsample. Furthermore, the proposed method can be directly applicable when the measurement error model is a mixture distribution. Variance estimation using Taylor linearization is developed. Results from a limited simulation study are also presented.  相似文献   

17.
巩红禹  陈雅 《统计研究》2018,35(12):113-122
本文主要讨论样本代表性的改进和多目标调查两个问题。一,本文提出了一种新的改进样本代表性多目标抽样方法,增加样本量与调整样本结构相结合的方法-追加样本的平衡设计,即通过追加样本,使得补充的样本与原来的样本组合生成新的平衡样本,相对于初始样本,减少样本与总体的结构性偏差。平衡样本是指辅助变量总量的霍维茨汤普森估计量等于总体总量真值。二,平衡样本通过选择与多个目标参数相关的辅助变量,使得一套样本对不同的目标参数而言都具有良好的代表性,进而完成多目标调查。结合2010年第六次人口分县普查数据,通过选择多个目标参数,对追加样本后的平衡样本作事后评估结果表明,追加平衡设计能够有效改进样本结构,使得样本结构与总体结构相近,降低目标估计的误差;同时也说明平衡抽样设计能够实现多目标调查,提高样本的使用效率。  相似文献   

18.
In this study, we consider the application of the James–Stein estimator for population means from a class of arbitrary populations based on ranked set sample (RSS). We consider a basis for optimally combining sample information from several data sources. We succinctly develop the asymptotic theory of simultaneous estimation of several means for differing replications based on the well-defined shrinkage principle. We showcase that a shrinkage-type estimator will have, under quadratic loss, a substantial risk reduction relative to the classical estimator based on simple random sample and RSS. Asymptotic distributional quadratic biases and risks of the shrinkage estimators are derived and compared with those of the classical estimator. A simulation study is used to support the asymptotic result. An over-riding theme of this study is that the shrinkage estimation method provides a powerful extension of its traditional counterpart for non-normal populations. Finally, we will use a real data set to illustrate the computation of the proposed estimators.  相似文献   

19.
如何解决网络访问固定样本调查的统计推断问题,是大数据背景下网络调查面临的严重挑战。针对此问题,提出将网络访问固定样本的调查样本与概率样本结合,利用倾向得分逆加权和加权组调整构造伪权数来估计目标总体,进一步采用基于有放回概率抽样的Vwr方法、基于广义回归估计的Vgreg方法与Jackknife方法来估计方差,并比较不同方法估计的效果。研究表明:无论概率样本的样本量较大还是较小,本研究所提出的总体均值估计方法效果较好,并且在方差估计中Jackknife方法的估计效果最好。  相似文献   

20.
规下工业抽样调查是社会经济统计调查的重要组成部分,为国民经济核算提供基础数据,而样本代表性直接决定统计推断结果。对企业目录库抽取平衡样本,能够使得样本结构与总体结构相似。平衡样本是指满足如下条件的样本:辅助变量的汉森赫维茨估计等于总体总量真值。平衡抽样设计需要包含丰富辅助信息的完善抽样框,政府统计数据能够为此提供足够的支撑。基于2009年工业企业数据库的实证分析表明,平衡抽样设计对总体总量的估计相对误差很小,特别是估计的均值与总体真值非常接近,近似无偏;与简单随机抽样比较,平衡抽样设计更加有效。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号