首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 678 毫秒
1.
Multivariate shrinkage estimation of small area means and proportions   总被引:3,自引:0,他引:3  
The familiar (univariate) shrinkage estimator of a small area mean or proportion combines information from the small area and a national survey. We define a multivariate shrinkage estimator which combines information also across subpopulations and outcome variables. The superiority of the multivariate shrinkage over univariate shrinkage, and of the univariate shrinkage over the unbiased (sample) means, is illustrated on examples of estimating the local area rates of economic activity in the subpopulations defined by ethnicity, age and sex. The examples use the sample of anonymized records of individuals from the 1991 UK census. The method requires no distributional assumptions but relies on the appropriateness of the quadratic loss function. The implementation of the method involves minimum outlay of computing. Multivariate shrinkage is particularly effective when the area level means are highly correlated and the sample means of one or a few components have small sampling and between-area variances. Estimations for subpopulations based on small samples can be greatly improved by incorporating information from subpopulations with larger sample sizes.  相似文献   

2.
Abstract. A model‐based predictive estimator is proposed for the population proportions of a polychotomous response variable, based on a sample from the population and on auxiliary variables, whose values are known for the entire population. The responses for the non‐sample units are predicted using a multinomial logit model, which is a parametric function of the auxiliary variables. A bootstrap estimator is proposed for the variance of the predictive estimator, its consistency is proved and its small sample performance is compared with that of an analytical estimator. The proposed predictive estimator is compared with other available estimators, including model‐assisted ones, both in a simulation study involving different sampling designs and model mis‐specification, and using real data from an opinion survey. The results indicate that the prediction approach appears to use auxiliary information more efficiently than the model‐assisted approach.  相似文献   

3.
Hypertension is a highly prevalent cardiovascular disease. It marks a considerable cost factor to many national health systems. Despite its prevalence, regional disease distributions are often unknown and must be estimated from survey data. However, health surveys frequently lack in regional observations due to limited resources. Obtained prevalence estimates suffer from unacceptably large sampling variances and are not reliable. Small area estimation solves this problem by linking auxiliary data from multiple regions in suitable regression models. Typically, either unit- or area-level observations are considered for this purpose. But with respect to hypertension, both levels should be used. Hypertension has characteristic comorbidities and is strongly related to lifestyle features, which are unit-level information. It is also correlated with socioeconomic indicators that are usually measured on the area-level. But the level combination is challenging as it requires multi-level model parameter estimation from small samples. We use a multi-level small area model with level-specific penalization to overcome this issue. Model parameter estimation is performed via stochastic coordinate gradient descent. A jackknife estimator of the mean squared error is presented. The methodology is applied to combine health survey data and administrative records to estimate regional hypertension prevalence in Germany.  相似文献   

4.
The Fay–Herriot model is a standard model for direct survey estimators in which the true quantity of interest, the superpopulation mean, is latent and its estimation is improved through the use of auxiliary covariates. In the context of small area estimation, these estimates can be further improved by borrowing strength across spatial regions or by considering multiple outcomes simultaneously. We provide here two formulations to perform small area estimation with Fay–Herriot models that include both multivariate outcomes and latent spatial dependence. We consider two model formulations. In one of these formulations the outcome‐by‐space dependence structure is separable. The other accounts for the cross dependence through the use of a generalized multivariate conditional autoregressive (GMCAR) structure. The GMCAR model is shown, in a state‐level example, to produce smaller mean square prediction errors, relative to equivalent census variables, than the separable model and the state‐of‐the‐art multivariate model with unstructured dependence between outcomes and no spatial dependence. In addition, both the GMCAR and the separable models give smaller mean squared prediction error than the state‐of‐the‐art model when conducting small area estimation on county level data from the American Community Survey.  相似文献   

5.
Longitudinal surveys have emerged in recent years as an important data collection tool for population studies where the primary interest is to examine population changes over time at the individual level. Longitudinal data are often analyzed through the generalized estimating equations (GEE) approach. The vast majority of existing literature on the GEE method; however, is developed under non‐survey settings and are inappropriate for data collected through complex sampling designs. In this paper the authors develop a pseudo‐GEE approach for the analysis of survey data. They show that survey weights must and can be appropriately accounted in the GEE method under a joint randomization framework. The consistency of the resulting pseudo‐GEE estimators is established under the proposed framework. Linearization variance estimators are developed for the pseudo‐GEE estimators when the finite population sampling fractions are small or negligible, a scenario often held for large‐scale surveys. Finite sample performances of the proposed estimators are investigated through an extensive simulation study using data from the National Longitudinal Survey of Children and Youth. The results show that the pseudo‐GEE estimators and the linearization variance estimators perform well under several sampling designs and for both continuous and binary responses. The Canadian Journal of Statistics 38: 540–554; 2010 © 2010 Statistical Society of Canada  相似文献   

6.
We present a novel methodology for estimating the parameters of a finite mixture model (FMM) based on partially rank‐ordered set (PROS) sampling and use it in a fishery application. A PROS sampling design first selects a simple random sample of fish and creates partially rank‐ordered judgement subsets by dividing units into subsets of prespecified sizes. The final measurements are then obtained from these partially ordered judgement subsets. The traditional expectation–maximization algorithm is not directly applicable for these observations. We propose a suitable expectation–maximization algorithm to estimate the parameters of the FMMs based on PROS samples. We also study the problem of classification of the PROS sample into the components of the FMM. We show that the maximum likelihood estimators based on PROS samples perform substantially better than their simple random sample counterparts even with small samples. The results are used to classify a fish population using the length‐frequency data.  相似文献   

7.
Nested error linear regression models using survey weights have been studied in small area estimation to obtain efficient model‐based and design‐consistent estimators of small area means. The covariates in these nested error linear regression models are not subject to measurement errors. In practical applications, however, there are many situations in which the covariates are subject to measurement errors. In this paper, we develop a nested error linear regression model with an area‐level covariate subject to functional measurement error. In particular, we propose a pseudo‐empirical Bayes (PEB) predictor to estimate small area means. This predictor borrows strength across areas through the model and makes use of the survey weights to preserve the design consistency as the area sample size increases. We also employ a jackknife method to estimate the mean squared prediction error (MSPE) of the PEB predictor. Finally, we report the results of a simulation study on the performance of our PEB predictor and associated jackknife MSPE estimator.  相似文献   

8.
In survey sampling, policy decisions regarding the allocation of resources to sub‐groups of a population depend on reliable predictors of their underlying parameters. However, in some sub‐groups, called small areas due to small sample sizes relative to the population, the information needed for reliable estimation is typically not available. Consequently, data on a coarser scale are used to predict the characteristics of small areas. Mixed models are the primary tools in small area estimation (SAE) and also borrow information from alternative sources (e.g., previous surveys and administrative and census data sets). In many circumstances, small area predictors are associated with location. For instance, in the case of chronic disease or cancer, it is important for policy makers to understand spatial patterns of disease in order to determine small areas with high risk of disease and establish prevention strategies. The literature considering SAE with spatial random effects is sparse and mostly in the context of spatial linear mixed models. In this article, small area models are proposed for the class of spatial generalized linear mixed models to obtain small area predictors and corresponding second‐order unbiased mean squared prediction errors via Taylor expansion and a parametric bootstrap approach. The performance of the proposed approach is evaluated through simulation studies and application of the models to a real esophageal cancer data set from Minnesota, U.S.A. The Canadian Journal of Statistics 47: 426–437; 2019 © 2019 Statistical Society of Canada  相似文献   

9.
The stratified Cox model is commonly used for stratified clinical trials with time‐to‐event endpoints. The estimated log hazard ratio is approximately a weighted average of corresponding stratum‐specific Cox model estimates using inverse‐variance weights; the latter are optimal only under the (often implausible) assumption of a constant hazard ratio across strata. Focusing on trials with limited sample sizes (50‐200 subjects per treatment), we propose an alternative approach in which stratum‐specific estimates are obtained using a refined generalized logrank (RGLR) approach and then combined using either sample size or minimum risk weights for overall inference. Our proposal extends the work of Mehrotra et al, to incorporate the RGLR statistic, which outperforms the Cox model in the setting of proportional hazards and small samples. This work also entails development of a remarkably accurate plug‐in formula for the variance of RGLR‐based estimated log hazard ratios. We demonstrate using simulations that our proposed two‐step RGLR analysis delivers notably better results through smaller estimation bias and mean squared error and larger power than the stratified Cox model analysis when there is a treatment‐by‐stratum interaction, with similar performance when there is no interaction. Additionally, our method controls the type I error rate while the stratified Cox model does not in small samples. We illustrate our method using data from a clinical trial comparing two treatments for colon cancer.  相似文献   

10.
规下工业抽样调查是社会经济统计调查的重要组成部分,为国民经济核算提供基础数据,而样本代表性直接决定统计推断结果。对企业目录库抽取平衡样本,能够使得样本结构与总体结构相似。平衡样本是指满足如下条件的样本:辅助变量的汉森赫维茨估计等于总体总量真值。平衡抽样设计需要包含丰富辅助信息的完善抽样框,政府统计数据能够为此提供足够的支撑。基于2009年工业企业数据库的实证分析表明,平衡抽样设计对总体总量的估计相对误差很小,特别是估计的均值与总体真值非常接近,近似无偏;与简单随机抽样比较,平衡抽样设计更加有效。  相似文献   

11.
In the absence of placebo‐controlled trials, the efficacy of a test treatment can be alternatively examined by showing its non‐inferiority to an active control; that is, the test treatment is not worse than the active control by a pre‐specified margin. The margin is based on the effect of the active control over placebo in historical studies. In other words, the non‐inferiority setup involves a network of direct and indirect comparisons between test treatment, active controls, and placebo. Given this framework, we consider a Bayesian network meta‐analysis that models the uncertainty and heterogeneity of the historical trials into the non‐inferiority trial in a data‐driven manner through the use of the Dirichlet process and power priors. Depending on whether placebo was present in the historical trials, two cases of non‐inferiority testing are discussed that are analogs of the synthesis and fixed‐margin approach. In each of these cases, the model provides a more reliable estimate of the control given its effect in other trials in the network, and, in the case where placebo was only present in the historical trials, the model can predict the effect of the test treatment over placebo as if placebo had been present in the non‐inferiority trial. It can further answer other questions of interest, such as comparative effectiveness of the test treatment among its comparators. More importantly, the model provides an opportunity for disproportionate randomization or the use of small sample sizes by allowing borrowing of information from a network of trials to draw explicit conclusions on non‐inferiority. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
We consider the problem of supplementing survey data with additional information from a population. The framework we use is very general; examples are missing data problems, measurement error models and combining data from multiple surveys. We do not require the survey data to be a simple random sample of the population of interest. The key assumption we make is that there exists a set of common variables between the survey and the supplementary data. Thus, the supplementary data serve the dual role of providing adjustments to the survey data for model consistencies and also enriching the survey data for improved efficiency. We propose a semi‐parametric approach using empirical likelihood to combine data from the two sources. The method possesses favourable large and moderate sample properties. We use the method to investigate wage regression using data from the National Longitudinal Survey of Youth Study.  相似文献   

13.
Much of the small‐area estimation literature focuses on population totals and means. However, users of survey data are often interested in the finite‐population distribution of a survey variable and in the measures (e.g. medians, quartiles, percentiles) that characterize the shape of this distribution at the small‐area level. In this paper we propose a model‐based direct estimator (MBDE, Chandra and Chambers) of the small‐area distribution function. The MBDE is defined as a weighted sum of sample data from the area of interest, with weights derived from the calibrated spline‐based estimate of the finite‐population distribution function introduced by Harms and Duchesne, under an appropriately specified regression model with random area effects. We also discuss the mean squared error estimation of the MBDE. Monte Carlo simulations based on both simulated and real data sets show that the proposed MBDE and its associated mean squared error estimator perform well when compared with alternative estimators of the area‐specific finite‐population distribution function.  相似文献   

14.
15.
Summary.  In sample surveys of finite populations, subpopulations for which the sample size is too small for estimation of adequate precision are referred to as small domains. Demand for small domain estimates has been growing in recent years among users of survey data. We explore the possibility of enhancing the precision of domain estimators by combining comparable information collected in multiple surveys of the same population. For this, we propose a regression method of estimation that is essentially an extended calibration procedure whereby comparable domain estimates from the various surveys are calibrated to each other. We show through analytic results and an empirical study that this method may greatly improve the precision of domain estimators for the variables that are common to these surveys, as these estimators make effective use of increased sample size for the common survey items. The design-based direct estimators proposed involve only domain-specific data on the variables of interest. This is in contrast with small domain (mostly small area) indirect estimators, based on a single survey, which incorporate through modelling data that are external to the targeted small domains. The approach proposed is also highly effective in handling the closely related problem of estimation for rare population characteristics.  相似文献   

16.
Unit level linear mixed models are often used in small area estimation (SAE), and the empirical best linear unbiased prediction (EBLUP) is widely used for the estimation of small area means under such models. However, EBLUP requires population level auxiliary data, atleast area specific aggregated values. Sometimes population level auxiliary data is either not available or not consistent with the survey data. We describe a SAE method that uses estimated population auxiliary information. Empirical results show that proposed method for SAE produces an efficient set of small area estimates.  相似文献   

17.
The authors describe a method for fitting failure time mixture models that postulate the existence of both susceptibles and long‐term survivors when covariate data are only partially observed. Their method is based on a joint model that combines a Weibull regression model for the susceptibles, a logistic regression model for the probability of being a susceptible, and a general location model for the distribution of the covariates. A Bayesian approach is taken, and Gibbs sampling is used to fit the model to the incomplete data. An application to clinical data on tonsil cancer and a small Monte Carlo study indicate potential large gains in efficiency over standard complete‐case analysis as well as reasonable performance in a variety of situations.  相似文献   

18.
In this paper, we propose a methodology to analyze longitudinal data through distances between pairs of observations (or individuals) with regard to the explanatory variables used to fit continuous response variables. Restricted maximum-likelihood and generalized least squares are used to estimate the parameters in the model. We applied this new approach to study the effect of gender and exposure on the deviant behavior variable with respect to tolerance for a group of youths studied over a period of 5 years. Were performed simulations where we compared our distance-based method with classic longitudinal analysis with both AR(1) and compound symmetry correlation structures. We compared them under Akaike and Bayesian information criterions, and the relative efficiency of the generalized variance of the errors of each model. We found small gains in the proposed model fit with regard to the classical methodology, particularly in small samples, regardless of variance, correlation, autocorrelation structure and number of time measurements.  相似文献   

19.
Large governmental surveys typically provide accurate national statistics. To decrease the mean squared error of estimates for small areas, i.e., domains in which the sample size is small, auxiliary variables from administrative records are often used as covariates in a mixed linear model. It is generally assumed that the auxiliary information is available for every small area. In many cases, though, such information is available for only some of the small areas, either from another survey or from a previous administration of the same survey. The authors propose and study small area estimators that use multivariate models to combine information from several surveys. They discuss computational algorithms, and a simulation study indicates that if quantities in the different surveys are sufficiently correlated, substantial gains in efficiency can be achieved.  相似文献   

20.
Using survey weights, You & Rao [You and Rao, The Canadian Journal of Statistics 2002; 30, 431–439] proposed a pseudo‐empirical best linear unbiased prediction (pseudo‐EBLUP) estimator of a small area mean under a nested error linear regression model. This estimator borrows strength across areas through a linking model, and makes use of survey weights to ensure design consistency and preserve benchmarking property in the sense that the estimators add up to a reliable direct estimator of the mean of a large area covering the small areas. In this article, a second‐order approximation to the mean squared error (MSE) of the pseudo‐EBLUP estimator of a small area mean is derived. Using this approximation, an estimator of MSE that is nearly unbiased is derived; the MSE estimator of You & Rao [You and Rao, The Canadian Journal of Statistics 2002; 30, 431–439] ignored cross‐product terms in the MSE and hence it is biased. Empirical results on the performance of the proposed MSE estimator are also presented. The Canadian Journal of Statistics 38: 598–608; 2010 © 2010 Statistical Society of Canada  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号