首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A common scenario in finite population inference is that it is possible to find a working superpopulation model which explains the main features of the population but which may not capture all the fine details. In addition, there are often outliers in the population which do not follow the assumed superpopulation model. In situations like these, it is still advantageous to make use of the working model to estimate finite population quantities, provided that we do it in a robust manner. The approach that we suggest is first to fit the working model to the sample and then to fine-tune for departures from the model assumed by estimating the conditional distribution of the residuals as a function of the auxiliary variable. This is a more direct approach to handling outliers and model misspecification than the Huber approach that is currently being used. Two simple methods, stratification and nearest neighbour smoothing, are used to estimate the conditional distributions of the residuals, which result in two modifications to the standard model-based estimator of the population distribution function. The estimators suggested perform very well in simulation studies involving two types of model departure and have small variances due to their model-based construction as well as acceptable bias. The potential advantage of the proposed robustified model-based approach over direct nonparametric regression is also demonstrated.  相似文献   

2.
On the planning and design of sample surveys   总被引:1,自引:1,他引:0  
Surveys rely on structured questions used to map out reality, using sample observations from a population frame, into data that can be statistically analyzed. This paper focuses on the planning and design of surveys, making a distinction between individual surveys, household surveys and establishment surveys. Knowledge from cognitive science is used to provide guidelines on questionnaire design. Non-standard, but simple, statistical methods are described for analyzing survey results. The paper is based on experience gained by conducting over 150 customer satisfaction surveys in Europe, America and the Far East.  相似文献   

3.
Summary.  The number of people to select within selected households has significant consequences for the conduct and output of household surveys. The operational and data quality implications of this choice are carefully considered in many surveys, but the effect on statistical efficiency is not well understood. The usual approach is to select all people in each selected household, where operational and data quality concerns make this feasible. If not, one person is usually selected from each selected household. We find that this strategy is not always justified, and we develop intermediate designs between these two extremes. Current practices were developed when household survey field procedures needed to be simple and robust; however, more complex designs are now feasible owing to the increasing use of computer-assisted interviewing. We develop more flexible designs by optimizing survey cost, based on a simple cost model, subject to a required variance for an estimator of population total. The innovation lies in the fact that household sample sizes are small integers, which creates challenges in both design and estimation. The new methods are evaluated empirically by using census and health survey data, showing considerable improvement over existing methods in some cases.  相似文献   

4.
The properties of the estimators of population mean arising from the ratio and product methods of estimation in the context of sample surveys have been analyzed in this paper when the observations on both the study and auxiliary variables are contaminated with measurement errors. The measurement errors in both the variables are also correlated. The properties of the ratio and product estimators along with the sample mean under the influence of measurement errors are derived and studied. The properties of the estimators in finite samples are studied through Monte-Carlo simulation and its findings are reported.  相似文献   

5.
Summary. Inflation-type weighted estimators for variance components can be badly biased. Modified weighted estimators suggested in the literature are also badly biased for certain sampling designs. We propose new estimators for variance components, some of which are approximately unbiased regardless of the sampling design. These estimators require knowledge of the joint inclusion probabilities of the observations. The small sample properties of the estimators are studied via simulation for the simple one-way random-effects model. An application is given by using data from the US Hispanic Health and Nutrition Examination Survey.  相似文献   

6.
The article presents extensive results from testing for bias and serially correlated errors in a collection of time series of quarterly multiperiod forecasts for six variables including real GNP growth, inflation, and unemployment. The analysis covers responses by 79 frequent participants in economic outlook surveys conducted regularly since 1968. It shows much greater incidence of apparently systematic errors for inflation than for the other variables. Also, the tests are more favorable to composite group forecasts than to most of the individual forecast sets.  相似文献   

7.
In the National Survey of Sexual Attitudes and Lifestyles (NATSSAL), it is recognized that non-response is unlikely to be ignorable. In some surveys, in addition to the response variables of interest, there may also be an 'enthusiasm-to-respond' variable which is expected to be related to the probabilities of item and unit response. Inference techniques to deal with non-ignorable non-response, based on a propensity-to-respond score, can be developed when there are both item and unit non-responders. For the NATSSAL data, an interviewer-measured interviewee embarrassment variable is combined with demographics to produce a score for the propensity to respond. The necessary likelihood development is outlined and alternative approaches to interval estimation are compared. The methodology is illustrated through an estimation of virginity from NATSSAL data.  相似文献   

8.
在工资差距分解问题中,研究者经常会遇到样本选择偏差问题,直接忽略会导致最终估计结果产生严重偏差,同时在众多工资差距分解方法中,相比于均值分解,分布分解方法更受研究者青睐。针对参数分位回归,本文首次提出可加形式与非可加形式的样本选择参数分位回归(SSPQR)模型,并基于这两类样本选择参数分位回归模型给出修正样本选择偏差后的参数分位回归工资差距分布分解方法。运用上述方法及已有的工资分布分解方法,借助CHNS2015年度城镇数据,本文研究了我国城镇男女工资差距及差距分解问题,得出以下结论:①男女工资差距主要来源是性别歧视问题;②经过样本选择偏差修正后,实际的工资差距更大,歧视问题更严重;③男女工资差距程度在不同分位点上结果不同,换句话说,我们不能简单地仅从平均水平来判断工资差距程度;④与其他已有方法计算结果比较发现,SSPQR计算的工资差距程度更大。  相似文献   

9.
Sample coordination maximizes or minimizes the overlap of two or more samples selected from overlapping populations. It can be applied to designs with simultaneous or sequential selection of samples. We propose a method for sample coordination in the former case. We consider the case where units are to be selected with maximum overlap using two designs with given unit inclusion probabilities. The degree of coordination is measured by the expected sample overlap, which is bounded above by a theoretical bound, called the absolute upper bound, and which depends on the unit inclusion probabilities. If the expected overlap equals the absolute upper bound, the sample coordination is maximal. Most of the methods given in the literature consider fixed marginal sampling designs, but in many cases, the absolute upper bound is not achieved. We propose to construct optimal sampling designs for given unit inclusion probabilities in order to realize maximal coordination. Our method is based on some theoretical conditions on joint selection probability of two samples and on the controlled selection method with linear programming implementation. The method can also be applied to minimize the sample overlap.  相似文献   

10.
Estimation of a characteristic based on surveys repeated at regular intervals is considered. A state space formulation is given for the problem and the Kalman Filter is used to obtain an estimate and its variance. Some examples are also given to illustrate the methodology.  相似文献   

11.
Summary.  Realistic statistical modelling of observational data often suggests a statistical model which is not fully identified, owing to potential biases that are not under the control of study investigators. Bayesian inference can be implemented with such a model, ideally with the most precise prior knowledge that can be ascertained. However, as a consequence of the non-identifiability, inference cannot be made arbitrarily accurate by choosing the sample size to be sufficiently large. In turn, this has consequences for sample size determination. The paper presents a sample size criterion that is based on a quantification of how much Bayesian learning can arise in a given non-identified model. A global perspective is adopted, whereby choosing larger sample sizes for some studies necessarily implies that some other potentially worthwhile studies cannot be undertaken. This suggests that smaller sample sizes should be selected with non-identified models, as larger sample sizes constitute a squandering of resources in making estimator variances very small compared with their biases. Particularly, consider two investigators planning the same study, one of whom admits to the potential biases at hand and consequently uses a non-identified model, whereas the other pretends that there are no biases, leading to an identified but less realistic model. It is seen that the former investigator always selects a smaller sample size than the latter, with the difference being quite marked in some illustrative cases.  相似文献   

12.
The problem of a sample allocation between strata in the case of multiparameter surveys is considered in this article. There are several multivariate sample allocation methods and, moreover, several criteria to deal with in such a case. A maximum coefficient of variation of estimators of the population mean of characters under study is taken as the optimality criterion. This article contains a study on a group of the methods that are easy to implement and do not need complex numerical computation; however, they all are approximate. Five such methods are presented and compared using a simulation study. Finally, it is shown which methods should be considered when designing a survey in which the multivariate sample allocation is to be involved.  相似文献   

13.
In clinical trials with binary endpoints, the required sample size does not depend only on the specified type I error rate, the desired power and the treatment effect but also on the overall event rate which, however, is usually uncertain. The internal pilot study design has been proposed to overcome this difficulty. Here, nuisance parameters required for sample size calculation are re-estimated during the ongoing trial and the sample size is recalculated accordingly. We performed extensive simulation studies to investigate the characteristics of the internal pilot study design for two-group superiority trials where the treatment effect is captured by the relative risk. As the performance of the sample size recalculation procedure crucially depends on the accuracy of the applied sample size formula, we firstly explored the precision of three approximate sample size formulae proposed in the literature for this situation. It turned out that the unequal variance asymptotic normal formula outperforms the other two, especially in case of unbalanced sample size allocation. Using this formula for sample size recalculation in the internal pilot study design assures that the desired power is achieved even if the overall rate is mis-specified in the planning phase. The maximum inflation of the type I error rate observed for the internal pilot study design is small and lies below the maximum excess that occurred for the fixed sample size design.  相似文献   

14.
Under given concrete exogenous conditions, the fraction of identifiable records in a microdata file without positive identifiers such as name and address is estimated. The effect of possible noise in the data, as well as the sample property of microdata files, is taken into account. Using real microdata files, it is shown that there is no risk of disclosure if the information content of characteristics known to the investigator (additional knowledge) is limited. Files with additional knowledge of large information content yield a high risk of disclosure. This can be eliminated only by massive modifications of the data records, which, however, involve large biases for complex statistical evaluations. In this case, the requirement for privacy protection and high-quality data perhaps may be fulfilled only if the linkage of such files with extensive additional knowledge is prevented by appropriate organizational and legal restrictions.  相似文献   

15.
Summary.  Latent class analysis has been used to model measurement error, to identify flawed survey questions and to estimate mode effects. Using data from a survey of University of Maryland alumni together with alumni records, we evaluate this technique to determine its usefulness for detecting bad questions in the survey context. Two sets of latent class analysis models are applied in this evaluation: latent class models with three indicators and latent class models with two indicators under different assumptions about prevalence and error rates. Our results indicated that the latent class analysis approach produced good qualitative results for the latent class models—the item that the model deemed the worst was the worst according to the true scores. However, the approach yielded weaker quantitative estimates of the error rates for a given item.  相似文献   

16.
We consider a general design that allows information for different patterns, or sets, of data items to be collected from different sample units, which we call a Split Questionnaire Design (SQD). While SQDs have been historically used to accommodate constraints on respondent burden, this paper shows they can also be an efficient design option. The efficiency of a design can be measured by the cost required to meet constraints on the accuracy of estimates. Moreover, this paper shows how an SQD provides considerable flexibility when exploring the balance between the design's efficiency and the burden it places on respondents. The targets of interest to the design are analytic parameters, such as regression coefficients. Empirical results show that SQDs are worthwhile considering.  相似文献   

17.
The condition of fixed sample size is essential for the existence of a Sen-Yates-Grundy form variance and its design unbiased estimator, in the problem of estimating the mean, variance and covariance of a finite population.  相似文献   

18.
We investigate several estimators of the negative binomial (NB) dispersion parameter for highly stratified count data for which the statistical model has a separate mean parameter for each stratum. If the number of samples per stratum is small then the model is highly parameterized and the maximum likelihood estimator (MLE) of the NB dispersion parameter can be biased and inefficient. Some of the estimators we investigate include adjustments for the number of mean parameters to reduce bias. We extend other estimators that were developed for the iid case, to reduce bias when there are many mean parameters. We demonstrate using simulations that an adjusted double extended quasi-likelihood estimator we proposed gives much improved estimates compared to the MLE. Adjusted extended quasi-likelihood and adjusted maximum likelihood estimators also give much-improved results. We illustrate the various estimators with stratified random bottom trawl survey data for cod (Gadus morhua) off the south coast of Newfoundland, Canada.  相似文献   

19.
We use the British Crime Survey (BCS) to analyse the demand for illicit drugs and the implications of drug use for the probability of subsequent unemployment. We demonstrate that the BCS questionnaire has a serious design flaw for this purpose, and we propose some simple modifications. We also develop a modelling technique that is suitable for existing BCS data and apply it to the 1994 and 1996 samples. We find evidence that soft drug use is associated with a greatly increased probability of later hard drug use and that past drug use is associated with increased probabilities of unemployment.  相似文献   

20.
We revisit the classic problem of estimation of the binomial parameters when both parameters n,p are unknown. We start with a series of results that illustrate the fundamental difficulties in the problem. Specifically, we establish lack of unbiased estimates for essentially any functions of just n or just p. We also quantify just how badly biased the sample maximum is as an estimator of n. Then, we motivate and present two new estimators of n. One is a new moment estimate and the other is a bias correction of the sample maximum. Both are easy to motivate, compute, and jackknife. The second estimate frequently beats most common estimates of n in the simulations, including the Carroll–Lombard estimate. This estimate is very promising. We end with a family of estimates for p; a specific one from the family is compared to the presently common estimate and the improvements in mean-squared error are often very significant. In all cases, the asymptotics are derived in one domain. Some other possible estimates such as a truncated MLE and empirical Bayes methods are briefly discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号