首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
There can be gains in estimation efficiency over equal probability samplin methods when one makes use of auxiliary information for probability proporti onal to size with replacement (πpswr) sampling methods. The usual method is simple to execute, but might lead to more than one appearance in the sampl e for any particular unit. When a suitable variable x is not available, one may know how to rank units reasonably well relative to the unknown y values before sample selection. When such ranking is possible, we introduce a simple and efficient sampling plan using the ranks as the unknown x measures of size. The proposed sampling plan is similar to, has the simplicity of, and has no greater sampling variance than with replacement sampling, but is without replacement.  相似文献   

A general explanation of the fiducial confidence interval and its construction for a class of parameters in which the distributions are stochastically increasing or decreasing is provided. Major differences between the fiducial interval and Bayesian and frequentist intervals are summarized. Applications of fiducial inference in evaluating pre-data frequentist intervals and general post-data intervals are discussed.  相似文献   

Summary.  Over the past few years surveys have expanded to new populations, have incorporated measurement of new and more complex substantive issues and have adopted new data collection tools. At the same time there has been a growing reluctance among many household populations to participate in surveys. These factors have combined to present survey designers and survey researchers with increased uncertainty about the performance of any given survey design at any particular point in time. This uncertainty has, in turn, challenged the survey practitioner's ability to control the cost of data collection and quality of resulting statistics. The development of computer-assisted methods for data collection has provided survey researchers with tools to capture a variety of process data ('paradata') that can be used to inform cost–quality trade-off decisions in realtime. The ability to monitor continually the streams of process data and survey data creates the opportunity to alter the design during the course of data collection to improve survey cost efficiency and to achieve more precise, less biased estimates. We label such surveys as 'responsive designs'. The paper defines responsive design and uses examples to illustrate the responsive use of paradata to guide mid-survey decisions affecting the non-response, measurement and sampling variance properties of resulting statistics.  相似文献   


The systematic sampling (SYS) design (Madow and Madow, 1944 Madow , L. H. , Madow , W. G. ( 1944 ). On the theory of systematic sampling . Ann. Math. Statist. 15 : 124 .[Crossref] [Google Scholar]) is widely used by statistical offices due to its simplicity and efficiency (e.g., Iachan, 1982 Iachan , R. ( 1982 ). Systematic sampling a critical review . Int. Statist. Rev. 50 : 293303 .[Crossref], [Web of Science ®] [Google Scholar]). But it suffers from a serious defect, namely, that it is impossible to unbiasedly estimate the sampling variance (Iachan, 1982 Iachan , R. ( 1982 ). Systematic sampling a critical review . Int. Statist. Rev. 50 : 293303 .[Crossref], [Web of Science ®] [Google Scholar]) and usual variance estimators (Yates and Grundy, 1953 Yates , F. , Grundy , P. M. ( 1953 ). Selection without replacement from within strata with probability proportional to size . J. Roy. Statist. Soc. Series B 1 : 253261 . [Google Scholar]) are inadequate and can overestimate the variance significantly (Särndal et al., 1992 Särndal , C. E. , Swenson , B. , Wretman , J. H. ( 1992 ). Model Assisted Survey Sampling . New York : Springer-Verlag , Ch. 3 .[Crossref] [Google Scholar]). We propose a novel variance estimator which is less biased and that can be implemented with any given population order. We will justify this estimator theoretically and with a Monte Carlo simulation study.  相似文献   

Abstract.  A flexible list sequential π ps sampling method is introduced and studied. It can reproduce any given sampling design without replacement, of fixed or random sample size. The method is a splitting method and uses successive updating of inclusion probabilities. The main advantage of the method is in real-time sampling situations where it can be used as a powerful alternative to Bernoulli and Poisson sampling and can give any desired second-order inclusion probabilities and thus considerably reduce the variability of the sample size.  相似文献   

Abstract. Sampford's unequal probability sampling method is extended to the case that the inclusion probabilities do not sum to an integer. In this case, the sampling outcome is left open for exactly one randomly chosen unit and that unit gets a new inclusion probability. Three applications are presented. Two of them challenge traditional sampling routines. The simple Pareto sampling design, which was introduced by Rosén in 1997, is also extended. The extended Pareto design is shown to be close to the extended Sampford design.  相似文献   

This report provides guidelines for universities to consider in developing programs for training statisticians who will work in industry. Useful information for students who are considering industrial employment is also included. The recommended programs focus on real problems and the statistical theory and methodology that are useful in their solution. Technical competence is only one of many factors that industry considers when hiring and promoting statisticians. When a statistician leaves school, his or her skills and experiences should include statistical knowledge, practical problem solving, consulting practice, and the ability to communicate orally and in writing with nonstatisticians. There are many details that must be worked out (e.g., content of specific courses and organization of consulting internship programs), and it is hoped that the statistical societies and universities will form committees, hold conferences, and develop programs to address these issues further. Many of our recommendations apply more broadly to the training of all types of practicing statisticians.  相似文献   

Abstract. The efficiency of observational studies may be increased by applying multistage sampling designs. It is, however, not always transparent how to construct such a design to obtain increased efficiency. We here present a general statistical framework for describing and constructing multistage designs. We also provide tools for efficiency and cost‐efficiency comparisons, to facilitate the choice of sampling scheme. The comparisons are based on Fisher information matrices and the results are presented in graphs, where either efficiency or cost‐adjusted efficiency is plotted against a normalized measure of cost. The former curve resides in the unit square and is analogous to the receiver operating characteristic curve used for testing.  相似文献   

In the survey sampling estimation or prediction of both population’s and subopulation’s (domain’s) characteristics is one of the key issues. In the case of the estimation or prediction of domain’s characteristics one of the problems is looking for additional sources of information that can be used to increase the accuracy of estimators or predictors. One of these sources may be spatial and temporal autocorrelation. Due to the mean squared error (MSE) estimation, the standard assumption is that random variables are independent for population elements from different domains. If the assumption is taken into account, spatial correlation may be assumed only inside domains. In the paper, we assume some special case of the linear mixed model with two random components that obey assumptions of the first-order spatial autoregressive model SAR(1) (but inside groups of domains instead of domains) and first-order temporal autoregressive model AR(1). Based on the model, the empirical best linear unbiased predictor will be proposed together with an estimator of its MSE taking the spatial correlation between domains into account.  相似文献   

For the analysis of survey-weighted categorical data, one recommended method of analysis is a log-rate model. For each cell in a contingency table, the survey weights are averaged across subjects and incorporated into an offset for a loglinear model. Supposedly, one can then proceed with the analysis of unweighted observed cell counts. We provide theoretical and simulation-based evidence to show that the log-rate analysis is not an effective statistical analysis method and should not be used in general. The root of the problem is in its failure to properly account for variability in the individual weights within cells of a contingency table. This results in goodness-of-fit tests that have higher-than-nominal error rates and confidence intervals for odds ratios that have lower-than-nominal coverage.  相似文献   

Summary.  More precise policy making at all levels of government has fuelled tremendous demand for small area data—smaller than ever before. At the same time, there has been an unprecedented accumulation of data in geographic information systems, administrative records databases and more sophisticated survey sampling schemes. Researchers and practitioners have been trying to combine these diverse sources of data. But how should these diverse sources of data be combined in a way that is policy relevant and statistically principled? The paper illustrates these questions with several example applications at the state, county and local level: emerging geographic information systems databases, the need for estimates of small area income, poverty, demographic and uninsurance data by health authorities, and how administrative records databases (such as licensed day care facilities, traffic counts and unemployment insurance records) are being harvested for their information content. Finally, the paper proposes approaches for integrating these diverse sources of data with different error, uncertainty and quality profiles, and surveys persistent challenges in this area.  相似文献   

Variance estimation for a low income proportion   总被引:1,自引:0,他引:1  
Summary. Proportions below a given fraction of a quantile of an income distribution are often estimated from survey data in comparisons of poverty. We consider the estimation of the variance of such a proportion, estimated from Family Expenditure Survey data. We show how a linearization method of variance estimation may be applied to this proportion, allowing for the effects of both a complex sampling design and weighting by a raking method to population controls. We show that, for data for 1998–1999, the estimated variances are always increased when allowance is made for the design and raking weights, the principal effect arising from the design. We also study the properties of a simplified variance estimator and discuss extensions to a wider class of poverty measures.  相似文献   

中国劳动力调查的另一种四层次样本轮换方法   总被引:2,自引:1,他引:2  
侯志强 《统计研究》2008,25(6):93-96
针对中国劳动力调查在部分省级单位内采用的四阶段抽样设计,构造了一种四级单元连续调查五次时的四层次样本轮换方法。该方法中,一级单元采用样本轮换模式40 in,二级单元采用样本轮换模式20 in,三级单元采用样本轮换模式10 in,四级单元采用样本轮换模式5 in。该方法保证了各级单元的样本量在轮换过程中不发生变化,同时还保证了四级单元在相邻两个季度和相邻两年的相同季度时均具有一定的拼配样本。  相似文献   

Summary.  We consider the problem of obtaining population-based inference in the presence of missing data and outliers in the context of estimating the prevalence of obesity and body mass index measures from the 'Healthy for life' study. Identifying multiple outliers in a multivariate setting is problematic because of problems such as masking, in which groups of outliers inflate the covariance matrix in a fashion that prevents their identification when included, and swamping, in which outliers skew covariances in a fashion that makes non-outlying observations appear to be outliers. We develop a latent class model that assumes that each observation belongs to one of K unobserved latent classes, with each latent class having a distinct covariance matrix. We consider the latent class covariance matrix with the largest determinant to form an 'outlier class'. By separating the covariance matrix for the outliers from the covariance matrices for the remainder of the data, we avoid the problems of masking and swamping. As did Ghosh-Dastidar and Schafer, we use a multiple-imputation approach, which allows us simultaneously to conduct inference after removing cases that appear to be outliers and to promulgate uncertainty in the outlier status through the model inference. We extend the work of Ghosh-Dastidar and Schafer by embedding the outlier class in a larger mixture model, consider penalized likelihood and posterior predictive distributions to assess model choice and model fit, and develop the model in a fashion to account for the complex sample design. We also consider the repeated sampling properties of the multiple imputation removal of outliers.  相似文献   

捕获再捕获抽样估计量的模拟研究   总被引:3,自引:4,他引:3  
目前,捕获再捕获抽样已广泛应用于生物科学、社会科学、生命科学和医学的调查研究中。为此,对捕获再捕获抽样的最常用的Lincoln-Petersen估计量和Chapman估计量进行模拟比较,提出新的有偏估计量,其偏差介于上述两个估计量之间。理论和模拟结果显示:新估计量有时优于其它两个估计量;在实际应用中,当上述两个估计量较难取舍时,新估计量可以作为一个更好的选择。  相似文献   

The 175th anniversary of the ASA provides an opportunity to look back into the past and peer into the future. What led our forebears to found the association? What commonalities do we still see? What insights might we glean from their experiences and observations? I will use the anniversary as a chance to reflect on where we are now and where we are headed in terms of statistical education amidst the growth of data science. Statistics is the science of learning from data. By fostering more multivariable thinking, building data-related skills, and developing simulation-based problem solving, we can help to ensure that statisticians are fully engaged in data science and the analysis of the abundance of data now available to us.  相似文献   

Likelihood computation in spatial statistics requires accurate and efficient calculation of the normalizing constant (i.e. partition function) of the Gibbs distribution of the model. Two available methods to calculate the normalizing constant by Markov chain Monte Carlo methods are compared by simulation experiments for an Ising model, a Gaussian Markov field model and a pairwise interaction point field model.  相似文献   

Five statistical software packages for epidemiology and clinical trials are reviewed. The five packages are EPI INFO, EPICURE, EPILOG PLUS, STATA, and TRUE EPI-STAT. Only DOS versions of these packages are compared and rated (Windows versions are discussed but not rated). Although the packages differ in their target audiences, interfaces, capabilities, and approaches, they are examined according to criteria that are of most interest to epidemiologists, biostatisticians, and others involved in epidemiologic and clinical research. A general discussion with recommendations follows the review of the statistical packages.  相似文献   

A common data mining task is the search for associations in large databases. Here we consider the search for “interestingly large” counts in a large frequency table, having millions of cells, most of which have an observed frequency of 0 or 1. We first construct a baseline or null hypothesis expected frequency for each cell, and then suggest and compare screening criteria for ranking the cell deviations of observed from expected count. A criterion based on the results of fitting an empirical Bayes model to the cell counts is recommended. An example compares these criteria for searching the FDA Spontaneous Reporting System database maintained by the Division of Pharmacovigilance and Epidemiology. In the example, each cell count is the number of reports combining one of 1,398 drugs with one of 952 adverse events (total of cell counts = 4.9 million), and the problem is to screen the drug-event combinations for possible further investigation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号