首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary.  In sample surveys of finite populations, subpopulations for which the sample size is too small for estimation of adequate precision are referred to as small domains. Demand for small domain estimates has been growing in recent years among users of survey data. We explore the possibility of enhancing the precision of domain estimators by combining comparable information collected in multiple surveys of the same population. For this, we propose a regression method of estimation that is essentially an extended calibration procedure whereby comparable domain estimates from the various surveys are calibrated to each other. We show through analytic results and an empirical study that this method may greatly improve the precision of domain estimators for the variables that are common to these surveys, as these estimators make effective use of increased sample size for the common survey items. The design-based direct estimators proposed involve only domain-specific data on the variables of interest. This is in contrast with small domain (mostly small area) indirect estimators, based on a single survey, which incorporate through modelling data that are external to the targeted small domains. The approach proposed is also highly effective in handling the closely related problem of estimation for rare population characteristics.  相似文献   

2.
We consider the problem of supplementing survey data with additional information from a population. The framework we use is very general; examples are missing data problems, measurement error models and combining data from multiple surveys. We do not require the survey data to be a simple random sample of the population of interest. The key assumption we make is that there exists a set of common variables between the survey and the supplementary data. Thus, the supplementary data serve the dual role of providing adjustments to the survey data for model consistencies and also enriching the survey data for improved efficiency. We propose a semi‐parametric approach using empirical likelihood to combine data from the two sources. The method possesses favourable large and moderate sample properties. We use the method to investigate wage regression using data from the National Longitudinal Survey of Youth Study.  相似文献   

3.
Many large-scale sample surveys use panel designs under which sampled individuals are interviewed several times before being dropped from the sample. The longitudinal data bases available from such surveys could be used to provide estimates of gross change over time. One problem in using these data to estimate gross change is how to handle the period-to-period nonresponse. This nonresponse is typically nonrandom and, furthermore, may be nonignorable in that it cannot be accounted for by other observed quantities in the data. Under the models proposed in this article, which are appropriate for the analysis of categorical data, the probability of nonresponse may be taken to be a function of the missing variable of interest. The proposed models are fit using maximum likelihood estimation. As an example, the method is applied to the problem of estimating gross flows in labor-force participation using data from the Current Population Survey and the Canadian Labour Force Survey.  相似文献   

4.
Bayesian hierarchical formulations are utilized by the U.S. Bureau of Labor Statistics (BLS) with respondent‐level data for missing item imputation because these formulations are readily parameterized to capture correlation structures. BLS collects survey data under informative sampling designs that assign probabilities of inclusion to be correlated with the response on which sampling‐weighted pseudo posterior distributions are estimated for asymptotically unbiased inference about population model parameters. Computation is expensive and does not support BLS production schedules. We propose a new method to scale the computation that divides the data into smaller subsets, estimates a sampling‐weighted pseudo posterior distribution, in parallel, for every subset and combines the pseudo posterior parameter samples from all the subsets through their mean in the Wasserstein space of order 2. We construct conditions on a class of sampling designs where posterior consistency of the proposed method is achieved. We demonstrate on both synthetic data and in application to the Current Employment Statistics survey that our method produces results of similar accuracy as the usual approach while offering substantially faster computation.  相似文献   

5.
Reporting sampling errors of survey estimates is a problem that is commonly addressed when compiling a survey report. Because of the vast number of study variables or population characteristics and of interest domains in a survey, it is almost impossible to calculate and to publish the standard errors for each statistic. A way of overcoming such problem would be to estimate indirectly the sampling errors by using generalized variance functions, which define a statistical relationship between the sampling errors and the corresponding estimates. One of the problems with this approach is that the model specification has to be consistent with a roughly constant design effect. If the design effects vary greatly across estimates, as in the case of the Business Surveys, the prediction model is not correctly specified and the least-square estimation is biased. In this paper, we show an extension of the generalized variance functions, which address the above problems, which could be used in contexts similar to those encountered in Business Surveys. The proposed method has been applied to the Italian Structural Business Statistics Survey case.  相似文献   

6.
The aim of this paper is to formulate an analytical–informational–theoretical approach which, given the incomplete nature of the available micro-level data, can be used to provide disaggregated values of a given variable. A functional relationship between the variable to be disaggregated and the available variables/indicators at the area level is specified through a combination of different macro- and micro-data sources. Data disaggregation is accomplished by considering two different cases. In the first case, sub-area level information on the variable of interest is available, and a generalized maximum entropy approach is employed to estimate the optimal disaggregate model. In the second case, we assume that the sub-area level information is partial and/or incomplete, and we estimate the model on a smaller scale by developing a generalized cross-entropy-based formulation. The proposed spatial-disaggregation approach is used in relation to an Italian data set in order to compute the value-added per manufacturing sector of local labour systems within the Umbria region, by combining the available micro/macro-level data and by formulating a suitable set of constraints for the optimization problem in the presence of errors in micro-aggregates.  相似文献   

7.
Survey statisticians make use of auxiliary information to improve estimates. One important example is calibration estimation, which constructs new weights that match benchmark constraints on auxiliary variables while remaining “close” to the design weights. Multiple-frame surveys are increasingly used by statistical agencies and private organizations to reduce sampling costs and/or avoid frame undercoverage errors. Several ways of combining estimates derived from such frames have been proposed elsewhere; in this paper, we extend the calibration paradigm, previously used for single-frame surveys, to calculate the total value of a variable of interest in a dual-frame survey. Calibration is a general tool that allows to include auxiliary information from two frames. It also incorporates, as a special case, certain dual-frame estimators that have been proposed previously. The theoretical properties of our class of estimators are derived and discussed, and simulation studies conducted to compare the efficiency of the procedure, using different sets of auxiliary variables. Finally, the proposed methodology is applied to real data obtained from the Barometer of Culture of Andalusia survey.  相似文献   

8.
The measurement of flows in the American labor market has challenged researchers for years. Several problems, including response variability, rotation group bias, and matching errors, hinder accurate measurement of the flows. The Bureau of the Census and the Bureau of Labor Statistics jointly sponsored a conference to examine these problems, present current research on solutions to the problems, and submit recommendations for improving the data. Recommendations include procedural changes in the Current Population Survey as well as new estimation techniques.  相似文献   

9.
Small area estimation (SAE) concerns with how to reliably estimate population quantities of interest when some areas or domains have very limited samples. This is an important issue in large population surveys, because the geographical areas or groups with only small samples or even no samples are often of interest to researchers and policy-makers. For example, large population health surveys, such as Behavioural Risk Factor Surveillance System and Ohio Mecaid Assessment Survey (OMAS), are regularly conducted for monitoring insurance coverage and healthcare utilization. Classic approaches usually provide accurate estimators at the state level or large geographical region level, but they fail to provide reliable estimators for many rural counties where the samples are sparse. Moreover, a systematic evaluation of the performances of the SAE methods in real-world setting is lacking in the literature. In this paper, we propose a Bayesian hierarchical model with constraints on the parameter space and show that it provides superior estimators for county-level adult uninsured rates in Ohio based on the 2012 OMAS data. Furthermore, we perform extensive simulation studies to compare our methods with a collection of common SAE strategies, including direct estimators, synthetic estimators, composite estimators, and Datta GS, Ghosh M, Steorts R, Maples J.'s [Bayesian benchmarking with applications to small area estimation. Test 2011;20(3):574–588] Bayesian hierarchical model-based estimators. To set a fair basis for comparison, we generate our simulation data with characteristics mimicking the real OMAS data, so that neither model-based nor design-based strategies use the true model specification. The estimators based on our proposed model are shown to outperform other estimators for small areas in both simulation study and real data analysis.  相似文献   

10.
Despite having desirable properties, model‐assisted estimators are rarely used in anything but their simplest form to produce official statistics. This is due to the fact that the more complicated models are often ill suited to the available auxiliary data. Under a model‐assisted framework, we propose a regression tree estimator for a finite‐population total. Regression tree models are adept at handling the type of auxiliary data usually available in the sampling frame and provide a model that is easy to explain and justify. The estimator can be viewed as a post‐stratification estimator where the post‐strata are automatically selected by the recursive partitioning algorithm of the regression tree. We establish consistency of the regression tree estimator and a variance estimator, along with asymptotic normality of the regression tree estimator. We compare the performance of our estimator to other survey estimators using the United States Bureau of Labor Statistics Occupational Employment Statistics Survey data.  相似文献   

11.
The classic recursive bivariate probit model is of particular interest to researchers since it allows for the estimation of the treatment effect that a binary endogenous variable has on a binary outcome in the presence of unobservables. In this article, the authors consider the semiparametric version of this model and introduce a model fitting procedure which permits to estimate reliably the parameters of a system of two binary outcomes with a binary endogenous regressor and smooth functions of continuous covariates. They illustrate the empirical validity of the proposal through an extensive simulation study. The approach is applied to data from a survey, conducted in Botswana, on the impact of education on women's fertility. Some studies suggest that the estimated effect could have been biased by the possible endogeneity arising because unobservable confounders (e.g., ability and motivation) are associated with both fertility and education. The Canadian Journal of Statistics 39: 259–279; 2011 © 2011 Statistical Society of Canada  相似文献   

12.
The Fay–Herriot model is a standard model for direct survey estimators in which the true quantity of interest, the superpopulation mean, is latent and its estimation is improved through the use of auxiliary covariates. In the context of small area estimation, these estimates can be further improved by borrowing strength across spatial regions or by considering multiple outcomes simultaneously. We provide here two formulations to perform small area estimation with Fay–Herriot models that include both multivariate outcomes and latent spatial dependence. We consider two model formulations. In one of these formulations the outcome‐by‐space dependence structure is separable. The other accounts for the cross dependence through the use of a generalized multivariate conditional autoregressive (GMCAR) structure. The GMCAR model is shown, in a state‐level example, to produce smaller mean square prediction errors, relative to equivalent census variables, than the separable model and the state‐of‐the‐art multivariate model with unstructured dependence between outcomes and no spatial dependence. In addition, both the GMCAR and the separable models give smaller mean squared prediction error than the state‐of‐the‐art model when conducting small area estimation on county level data from the American Community Survey.  相似文献   

13.
In the National Survey of Sexual Attitudes and Lifestyles (NATSSAL), it is recognized that non-response is unlikely to be ignorable. In some surveys, in addition to the response variables of interest, there may also be an 'enthusiasm-to-respond' variable which is expected to be related to the probabilities of item and unit response. Inference techniques to deal with non-ignorable non-response, based on a propensity-to-respond score, can be developed when there are both item and unit non-responders. For the NATSSAL data, an interviewer-measured interviewee embarrassment variable is combined with demographics to produce a score for the propensity to respond. The necessary likelihood development is outlined and alternative approaches to interval estimation are compared. The methodology is illustrated through an estimation of virginity from NATSSAL data.  相似文献   

14.
Sampling has evolved into a universally accepted approach for gathering information and data mining as it is widely accepted that a reasonably modest-sized sample can sufficiently characterize a much larger population. In stratified sampling designs, the whole population is divided into homogeneous strata in order to achieve higher precision in the estimation. This paper proposes an efficient method of constructing optimum stratum boundaries (OSB) and determining optimum sample size (OSS) for the survey variable. The survey variable may not be available in practice since the variable of interest is unavailable prior to conducting the survey. Thus, the method is based on the auxiliary variable which is usually readily available from past surveys. To illustrate the application as an example using a real data, the auxiliary variable considered for this problem follows Weibull distribution. The stratification problem is formulated as a Mathematical Programming Problem (MPP) that seeks minimization of the variance of the estimated population parameter under Neyman allocation. The solution procedure employs the dynamic programming technique, which results in substantial gains in the precision of the estimates of the population characteristics.  相似文献   

15.
Calibration techniques in survey sampling, such as generalized regression estimation (GREG), were formalized in the 1990s to produce efficient estimators of linear combinations of study variables, such as totals or means. They implicitly lie on the assumption of a linear regression model between the variable of interest and some auxiliary variables in order to yield estimates with lower variance if the model is true and remaining approximately design-unbiased even if the model does not hold. We propose a new class of model-assisted estimators obtained by releasing a few calibration constraints and replacing them with a penalty term. This penalization is added to the distance criterion to minimize. By introducing the concept of penalized calibration, combining usual calibration and this ‘relaxed’ calibration, we are able to adjust the weight given to the available auxiliary information. We obtain a more flexible estimation procedure giving better estimates particularly when the auxiliary information is overly abundant or not fully appropriate to be completely used. Such an approach can also be seen as a design-based alternative to the estimation procedures based on the more general class of mixed models, presenting new prospects in some scopes of application such as inference on small domains.  相似文献   

16.
Among the goals of statistical matching, a very important one is the estimation of the joint distribution of variables not jointly observed in a sample survey but separately available from independent sample surveys. The absence of joint information on the variables of interest leads to uncertainty about the data generating model since the available sample information is unable to discriminate among a set of plausible joint distributions. In the present paper a short review of the concept of uncertainty in statistical matching under logical constraints, as well as how to measure uncertainty for continuous variables is presented. The notion of matching error is related to an appropriate measure of uncertainty and a criterion of selecting matching variables by choosing the variables minimizing such an uncertainty measure is introduced. Finally, a method to choose a plausible joint distribution for the variables of interest via iterative proportional fitting algorithm is described. The proposed methodology is then applied to household income and expenditure data when extra sample information regarding the average propensity to consume is available. This leads to a reconstructed complete dataset where each record includes measures on income and expenditure.  相似文献   

17.
Summary.  The first British National Survey of Sexual Attitudes and Lifestyles (NATSAL) was conducted in 1990–1991 and the second in 1999–2001. When surveys are repeated, the changes in population parameters are of interest and are generally estimated from a comparison of the data between surveys. However, since all surveys may be subject to bias, such comparisons may partly reflect a change in bias. Typically limited external data are available to estimate the change in bias directly. However, one approach, which is often possible, is to define in each survey a sample of participants who are eligible for both surveys, and then to compare the reporting of selected events that occurred before the earlier survey time point. A difference in reporting suggests a change in overall survey bias between time points, although other explanations are possible. In NATSAL, changes in bias are likely to be similar for groups of sexual experiences. The grouping of experiences allows the information that is derived from the selected events to be incorporated into inference concerning population changes in other sexual experiences. We use generalized estimating equations, which incorporate weighting for differential probabilities of sampling and non-response in a relatively straightforward manner. The results, combined with estimates of the change in reporting, are used to derive minimum established population changes, based on NATSAL data. For some key population parameters, the change in reporting is seen to be consistent with a change in bias alone. Recommendations are made for the design of future surveys.  相似文献   

18.
于力超  金勇进 《统计研究》2018,35(11):93-104
大规模抽样调查多采用复杂抽样设计,得到具有分层嵌套结构的调查数据集,其中不可避免会遇到数据缺失问题,针对分层结构含缺失数据集的插补策略目前鲜有研究。本文将Gibbs算法应用到分层含缺失数据集的多重插补过程中,分别研究了固定效应模型插补法和随机效应模型插补法,进而通过理论推导和数值模拟,在不同组内相关系数、群组规模、数据缺失比例等情形下,从参数估计结果的无偏性和有效性两方面,比较不同方法的插补效果,给出插补模型的选择建议。研究结果表明,采用随机效应模型作为插补模型时,得到的参数估计结果更准确,而固定效应模型作为插补模型操作相对简便,在数据缺失比例较小、组内相关系数较大、群组规模较大等情形下,可以采用固定效应插补模型,否则建议采用随机效应插补模型。  相似文献   

19.
Estimation of price indexes in the United States is generally based on complex rotating panel surveys. The sample for the Consumer Price Index, for example, is selected in three stages—geographic areas, establishments, and individual items—with 20% of the sample being replaced by rotation each year. At each period, a time series of data is available for use in estimation. This article examines how to best combine data for estimation of long-term and short-term changes and how to estimate the variances of the index estimators in the context of two-stage sampling. I extend the class of estimators, introduced by Valliant and Miller, of Laspeyres indexes formed using sample data collected from the current period back to a previous base period. Linearization estimators of variance for indexes of long-term and short-term change are derived. The theory is supported by an empirical simulation study using two-stage sampling of establishments and items from a population derived from U.S. Bureau of Labor Statistics data.  相似文献   

20.
面板数据的自适应Lasso分位回归方法研究   总被引:1,自引:0,他引:1  
如何在对参数进行估计的同时自动选择重要解释变量,一直是面板数据分位回归模型中讨论的热点问题之一。通过构造一种含多重随机效应的贝叶斯分层分位回归模型,在假定固定效应系数先验服从一种新的条件Laplace分布的基础上,给出了模型参数估计的Gibbs抽样算法。考虑到不同重要程度的解释变量权重系数压缩程度应该不同,所构造的先验信息具有自适应性的特点,能够准确地对模型中重要解释变量进行自动选取,且设计的切片Gibbs抽样算法能够快速有效地解决模型中各个参数的后验均值估计问题。模拟结果显示,新方法在参数估计精确度和变量选择准确度上均优于现有文献的常用方法。通过对中国各地区多个宏观经济指标的面板数据进行建模分析,演示了新方法估计参数与挑选变量的能力。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号