首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
MODEL-BASED VARIANCE ESTIMATION IN SURVEYS WITH STRATIFIED CLUSTERED DESIGN   总被引:1,自引:0,他引:1  
A model-based method for estimating the sampling variances of estimators of (sub-)population means, proportions, quantiles, and regression parameters in surveys with stratified clustered design is described and applied to a survey of US secondary education. The method is compared with the jackknife by a simulation study. The model-based estimators of the sampling variances have much smaller mean squared errors than their jackknife counterparts. In addition, they can be improved by incorporating information about the unknown parameters (variances) from external sources. A regression-based smoothing method for estimating the sampling variances of the estimators for a large number of subpopulation means is proposed. Such smoothing may be invaluable when subpopulations are represented in the sample by only few subjects.  相似文献   

2.
Summary.  Complex survey sampling is often used to sample a fraction of a large finite population. In general, the survey is conducted so that each unit (e.g. subject) in the sample has a different probability of being selected into the sample. For generalizability of the sample to the population, both the design and the probability of being selected into the sample must be incorporated in the analysis. In this paper we focus on non-standard regression models for complex survey data. In our motivating example, which is based on data from the Medical Expenditure Panel Survey, the outcome variable is the subject's 'total health care expenditures in the year 2002'. Previous analyses of medical cost data suggest that the variance is approximately equal to the mean raised to the power of 1.5, which is a non-standard variance function. Currently, the regression parameters for this model cannot be easily estimated in standard statistical software packages. We propose a simple two-step method to obtain consistent regression parameter and variance estimates; the method proposed can be implemented within any standard sample survey package. The approach is applicable to complex sample surveys with any number of stages.  相似文献   

3.
A practical problem with large scale survey data is the potential for overdispersion. Overdispersion occurs when the data display more variability than is predicted by the variance–mean relationship for the assumed sampling model. This paper describes a simple strategy for detecting and adjusting for overdispersion in large scale survey data. The method is primarily motivated by data on the relationship between social class and educational attainment obtained from a 2% sample from the 1991 census of the population of Great Britain. Overdispersion can be detected by first grouping the data into a number of strata of approximately equal size. Under the assumption that the observations are independent and there is no variability in the parameter of interest, there is a direct relationship between the nominal standard errors and the empirical or sample standard deviation of the parameter estimates obtained from each of the separate strata. With the 2% sample from the British census data, quite a discernible departure from this relationship was found, indicating overdispersion. After allowing for overdispersion, improved and more realistic measures of precision of the strength of the social class–education associations were obtained.  相似文献   

4.
Many authors have shown that a combined analysis of data from two or more types of recapture survey brings advantages, such as the ability to provide more information about parameters of interest. For example, a combined analysis of annual resighting and monthly radio-telemetry data allows separate estimates of true survival and emigration rates, whereas only apparent survival can be estimated from the resighting data alone. For studies involving more than one type of survey, biologists should consider how to allocate the total budget to the surveys related to the different types of marks so that they will gain optimal information from the surveys. For example, since radio tags and subsequent monitoring are very costly, while leg bands are cheap, the biologists should try to balance costs with information obtained in deciding how many animals should receive radios. Given a total budget and specific costs, it is possible to determine the allocation of sample sizes to different types of marks in order to minimize the variance of parameters of interest, such as annual survival and emigration rates. In this paper, we propose a cost function for a study where all birds receive leg bands and a subset receives radio tags and all new releases occur at the start of the study. Using this cost function, we obtain the allocation of sample sizes to the two survey types that minimizes the standard error of survival rate estimates or, alternatively, the standard error of emigration rates. Given the proposed costs, we show that for high resighting probability, e.g. 0.6, tagging roughly 10-40% of birds with radios will give survival estimates with standard errors within the minimum range. Lower resighting rates will require a higher percentage of radioed birds. In addition, the proposed costs require tagging the maximum possible percentage of radioed birds to minimize the standard error of emigration estimates.  相似文献   

5.
On the planning and design of sample surveys   总被引:1,自引:1,他引:0  
Surveys rely on structured questions used to map out reality, using sample observations from a population frame, into data that can be statistically analyzed. This paper focuses on the planning and design of surveys, making a distinction between individual surveys, household surveys and establishment surveys. Knowledge from cognitive science is used to provide guidelines on questionnaire design. Non-standard, but simple, statistical methods are described for analyzing survey results. The paper is based on experience gained by conducting over 150 customer satisfaction surveys in Europe, America and the Far East.  相似文献   

6.
Summary.  Time series arise often in environmental monitoring settings, which typically involve measuring processes repeatedly over time. In many such applications, observations are irregularly spaced and, additionally, are not distributed normally. An example is water monitoring data collected in Boston Harbor by the Massachusetts Water Resources Authority. We describe a simple robust approach for estimating regression parameters and a first-order autocorrelation parameter in a time series where the observations are irregularly spaced. Estimates are obtained from an estimating equation that is constructed as a linear combination of estimated innovation errors, suitably made robust by symmetric and possibly bounded functions. Under an assumption of data missing completely at random and mild regularity conditions, the proposed estimating equation yields consistent and asymptotically normal estimates. Simulations suggest that our estimator performs well in moderate sample sizes. We demonstrate our method on Secchi depth data collected from Boston Harbor.  相似文献   

7.
In many situations information from a sample of individuals can be supplemented by population level information on the relationship between a dependent variable and explanatory variables. Inclusion of the population level information can reduce bias and increase the efficiency of the parameter estimates.Population level information can be incorporated via constraints on functions of the model parameters. In general the constraints are nonlinear making the task of maximum likelihood estimation harder. In this paper we develop an alternative approach exploiting the notion of an empirical likelihood. It is shown that within the framework of generalised linear models, the population level information corresponds to linear constraints, which are comparatively easy to handle. We provide a two-step algorithm that produces parameter estimates using only unconstrained estimation. We also provide computable expressions for the standard errors. We give an application to demographic hazard modelling by combining panel survey data with birth registration data to estimate annual birth probabilities by parity.  相似文献   

8.
Summary.  Editing in surveys of economic populations is often complicated by the fact that outliers due to errors in the data are mixed in with correct, but extreme, data values. We describe and evaluate two automatic techniques for the identification of errors in such long-tailed data distributions. The first is a forward search procedure based on finding a sequence of error-free subsets of the error-contaminated data and then using regression modelling within these subsets to identify errors. The second uses a robust regression tree modelling procedure to identify errors. Both approaches can be implemented on a univariate basis or on a multivariate basis. An application to a business survey data set that contains a mix of extreme errors and true outliers is described.  相似文献   

9.
Generalised variance function (GVF) models are data analysis techniques often used in large‐scale sample surveys to approximate the design variance of point estimators for population means and proportions. Some potential advantages of the GVF approach include operational simplicity, more stable sampling errors estimates and providing a convenient method of summarising results when a high number of survey variables is considered. In this paper, several parametric and nonparametric methods for GVF estimation with binary variables are proposed and compared. The behavior of these estimators is analysed under heteroscedasticity and in the presence of outliers and influential observations. An empirical study based on the annual survey of living conditions in Galicia (a region in the northwest of Spain) illustrates the behaviour of the proposed estimators.  相似文献   

10.
Summary.  In sample surveys of finite populations, subpopulations for which the sample size is too small for estimation of adequate precision are referred to as small domains. Demand for small domain estimates has been growing in recent years among users of survey data. We explore the possibility of enhancing the precision of domain estimators by combining comparable information collected in multiple surveys of the same population. For this, we propose a regression method of estimation that is essentially an extended calibration procedure whereby comparable domain estimates from the various surveys are calibrated to each other. We show through analytic results and an empirical study that this method may greatly improve the precision of domain estimators for the variables that are common to these surveys, as these estimators make effective use of increased sample size for the common survey items. The design-based direct estimators proposed involve only domain-specific data on the variables of interest. This is in contrast with small domain (mostly small area) indirect estimators, based on a single survey, which incorporate through modelling data that are external to the targeted small domains. The approach proposed is also highly effective in handling the closely related problem of estimation for rare population characteristics.  相似文献   

11.
We consider the problem of supplementing survey data with additional information from a population. The framework we use is very general; examples are missing data problems, measurement error models and combining data from multiple surveys. We do not require the survey data to be a simple random sample of the population of interest. The key assumption we make is that there exists a set of common variables between the survey and the supplementary data. Thus, the supplementary data serve the dual role of providing adjustments to the survey data for model consistencies and also enriching the survey data for improved efficiency. We propose a semi‐parametric approach using empirical likelihood to combine data from the two sources. The method possesses favourable large and moderate sample properties. We use the method to investigate wage regression using data from the National Longitudinal Survey of Youth Study.  相似文献   

12.
巩红禹  陈雅 《统计研究》2018,35(12):113-122
本文主要讨论样本代表性的改进和多目标调查两个问题。一,本文提出了一种新的改进样本代表性多目标抽样方法,增加样本量与调整样本结构相结合的方法-追加样本的平衡设计,即通过追加样本,使得补充的样本与原来的样本组合生成新的平衡样本,相对于初始样本,减少样本与总体的结构性偏差。平衡样本是指辅助变量总量的霍维茨汤普森估计量等于总体总量真值。二,平衡样本通过选择与多个目标参数相关的辅助变量,使得一套样本对不同的目标参数而言都具有良好的代表性,进而完成多目标调查。结合2010年第六次人口分县普查数据,通过选择多个目标参数,对追加样本后的平衡样本作事后评估结果表明,追加平衡设计能够有效改进样本结构,使得样本结构与总体结构相近,降低目标估计的误差;同时也说明平衡抽样设计能够实现多目标调查,提高样本的使用效率。  相似文献   

13.
The linear regression model is commonly used in applications. One of the assumptions made is that the error variances are constant across all observations. This assumption, known as homoskedasticity, is frequently violated in practice. A commonly used strategy is to estimate the regression parameters by ordinary least squares and to compute standard errors that deliver asymptotically valid inference under both homoskedasticity and heteroskedasticity of an unknown form. Several consistent standard errors have been proposed in the literature, and evaluated in numerical experiments based on their point estimation performance and on the finite sample behaviour of associated hypothesis tests. We build upon the existing literature by constructing heteroskedasticity-consistent interval estimators and numerically evaluating their finite sample performance. Different bootstrap interval estimators are also considered. The numerical results favour the HC4 interval estimator.  相似文献   

14.
The author compares aspects of voluntary and involuntary sample surveys in West Germany. "The German microcensus as a non-voluntary survey draws a random sample from the total population which includes persons that would also respond in a voluntary survey (respondents) and persons that would not respond (non-respondents). The population of a voluntary survey, however, includes only respondents. Hence, statistical inference from a voluntary sample survey is only valid for the total population, if the population of respondents does not differ from the total population. This null hypothesis must be rejected from the comparisons of data from the German microcensus of 1985, 1986 and 1987 and corresponding voluntary test sample surveys. The discrepancies are great in central demographic and socio-economic variables such as region of residence, community size, age, marital status, income and social security." (SUMMARY IN ENG)  相似文献   

15.
In recent years the focus of research in survey sampling has changed to include a number of nontraditional topics such as nonsampling errors. In addition, the availability of data from large-scale sample surveys, along with computers and software to analyze the data, have changed the tools needed by survey sampling statisticians. It has also resulted in a diverse group of secondary data users who wish to learn how to analyze data from a complex survey. Thus it is time to reassess what we should be teaching students about survey sampling. This article brings together a panel of experts on survey sampling and teaching to discuss their views on what should be taught in survey sampling classes and how it should be taught.  相似文献   

16.
This article discusses the problem of testing the equality of two nonparametric regression functions against two-sided alternatives for uniform design on [0,1] with long memory moving average errors. The standard deviations and the long memory parameters are possibly different for the two errors. The article adapts the partial sum process idea used in the independent observations settings to construct the tests and derives their asymptotic null distributions. The article also shows that these tests are consistent for general alternatives and obtains their limiting distributions under a sequence of local alternatives. Since the limiting null distributions of these tests are unknown, we first conducted a Monte Carlo simulation study to obtain a few selected critical values of the proposed tests. Then based on these critical values, another Monte Carlo simulation is conducted to study the finite sample level and power behavior of these tests at some alternatives. The article also contains a simulation study that assesses the effect of estimating the nonparametric regression function on an estimate of the long memory parameter of the errors. It is observed that the estimate based on direct observations is generally preferable over the one based on the estimated nonparametric residuals.  相似文献   

17.
The properties of the estimators of population mean arising from the ratio and product methods of estimation in the context of sample surveys have been analyzed in this paper when the observations on both the study and auxiliary variables are contaminated with measurement errors. The measurement errors in both the variables are also correlated. The properties of the ratio and product estimators along with the sample mean under the influence of measurement errors are derived and studied. The properties of the estimators in finite samples are studied through Monte-Carlo simulation and its findings are reported.  相似文献   

18.
We consider non-response models for a single categorical response with categorical covariates whose values are always observed. We present Bayesian methods for ignorable models and a particular non-ignorable model, and we argue that standard methods of model comparison are inappropriate for comparing ignorable and non-ignorable models. Uncertainty about ignorability of non-response is incorporated by introducing parameters describing the extent of non-ignorability into a pattern mixture specification and integrating over the prior uncertainty associated with these parameters. Our approach is illustrated using polling data from the 1992 British general election panel survey. We suggest sample size adjustments for surveys when non-ignorable non-response is expected.  相似文献   

19.
Random coefficient polynomial regression model has been considered for prediction purpose when there is uncertainty about the degree of the polynomialo Expressions for mean square errors of two predictors based on simple estimators have been derived and their perfomaiices have been compared when parameters are estimated from the sample. A modified predictor has also been suggested when parameters in the predicting equations are to be estimated from the sample. Perform-ance ofseveral predictors haife been compared by cross validation technique from a real set of data.  相似文献   

20.
Adaptive cluster sampling is an efficient method of estimating the parameters of rare and clustered populations. The method mimics how biologists would like to collect data in the field by targeting survey effort to localised areas where the rare population occurs. Another popular sampling design is inverse sampling. Inverse sampling was developed so as to be able to obtain a sample of rare events having a predetermined size. Ideally, in inverse sampling, the resultant sample set will be sufficiently large to ensure reliable estimation of population parameters. In an effort to combine the good properties of these two designs, adaptive cluster sampling and inverse sampling, we introduce inverse adaptive cluster sampling with unequal selection probabilities. We develop an unbiased estimator of the population total that is applicable to data obtained from such designs. We also develop numerical approximations to this estimator. The efficiency of the estimators that we introduce is investigated through simulation studies based on two real populations: crabs in Al Khor, Qatar and arsenic pollution in Kurdistan, Iran. The simulation results show that our estimators are efficient.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号