期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Small area estimation of proportions in business surveys

《Journal of Statistical Computation and Simulation》2012,82(6):783-795

Binary data are often of interest in business surveys, particularly when the aim is to characterize grouping in the businesses making up the survey population. When small area estimates are required for such binary data, use of standard estimation methods based on linear mixed models (LMMs) becomes problematic. We explore two model-based techniques of small area estimation for small area proportions, the empirical best predictor (EBP) under a generalized linear mixed model and the model-based direct estimator (MBDE) under a population-level LMM. Our empirical results show that both the MBDE and the EBP perform well. The EBP is a computationally intensive method, whereas the MBDE is easy to implement. In case of model misspecification, the MBDE also appears to be more robust. The mean-squared error (MSE) estimation of MBDE is simple and straightforward, which is in contrast to the complicated MSE estimation for the EBP. 相似文献

2.

Empirical best linear unbiased and empirical Bayes prediction in multivariate small area estimation

《Journal of statistical planning and inference》1999,75(2):269-279

Small area estimation plays a prominent role in survey sampling due to a growing demand for reliable small area estimates from both public and private sectors. Popularity of model-based inference is increasing in survey sampling, particularly, in small area estimation. The estimates of the small area parameters can profitably ‘borrow strength’ from data on related multiple characteristics and/or auxiliary variables from other neighboring areas through appropriate models. Fay (1987, Small Area Statistics, Wiley, New York, pp. 91–102) proposed multivariate regression for small area estimation of multiple characteristics. The success of this modeling rests essentially on the strength of correlation of these dependent variables. To estimate small area mean vectors of multiple characteristics, multivariate modeling has been proposed in the literature via a multivariate variance components model. We use this approach to empirical best linear unbiased and empirical Bayes prediction of small area mean vectors. We use data from Battese et al. (1988, J. Amer. Statist. Assoc. 83, 28 –36) to conduct a simulation which shows that the multivariate approach may achieve substantial improvement over the usual univariate approach. 相似文献

3.

On Measuring Uncertainty of Benchmarked Predictors with Application to Disease Risk Estimate

Tatsuya Kubokawa Mana Hasukawa Kunihiko Takahashi 《Scandinavian Journal of Statistics》2014,41(2):394-413

Empirical Bayes (EB) estimates in general linear mixed models are useful for the small area estimation in the sense of increasing precision of estimation of small area means. However, one potential difficulty of EB is that the overall estimate for a larger geographical area based on a (weighted) sum of EB estimates is not necessarily identical to the corresponding direct estimate such as the overall sample mean. Another difficulty is that EB estimates yield over‐shrinking, which results in the sampling variance smaller than the posterior variance. One way to fix these problems is the benchmarking approach based on the constrained empirical Bayes (CEB) estimators, which satisfy the constraints that the aggregated mean and variance are identical to the requested values of mean and variance. In this paper, we treat the general mixed models, derive asymptotic approximations of the mean squared error (MSE) of CEB and provide second‐order unbiased estimators of MSE based on the parametric bootstrap method. These results are applied to natural exponential families with quadratic variance functions. As a specific example, the Poisson‐gamma model is dealt with, and it is illustrated that the CEB estimates and their MSE estimates work well through real mortality data. 相似文献

4.

An objective stepwise Bayes approach to small area estimation

Yanping Qu Bo Zhang 《Journal of Statistical Computation and Simulation》2015,85(7):1474-1494

The term ‘small area’ or ‘small domain’ is commonly used to denote a small geographical area that has a small subpopulation of people within a large area. Small area estimation is an important area in survey sampling because of the growing demand for better statistical inference for small areas in public or private surveys. In small area estimation problems the focus is on how to borrow strength across areas in order to develop a reliable estimator and which makes use of available auxiliary information. Some traditional methods for small area problems such as empirical best linear unbiased prediction borrow strength through linear models that provide links to related areas, which may not be appropriate for some survey data. In this article, we propose a stepwise Bayes approach which borrows strength through an objective posterior distribution. This approach results in a generalized constrained Dirichlet posterior estimator when auxiliary information is available for small areas. The objective posterior distribution is based only on the assumption of exchangeability across related areas and does not make any explicit model assumptions. The form of our posterior distribution allows us to assign a weight to each member of the sample. These weights can then be used in a straight forward fashion to make inferences about the small area means. Theoretically, the stepwise Bayes character of the posterior allows one to prove the admissibility of the point estimators suggesting that inferential procedures based on this approach will tend to have good frequentist properties. Numerically, we demonstrate in simulations that the proposed stepwise Bayes approach can have substantial strengths compared to traditional methods. 相似文献

5.

Small area estimation of proportions under a spatial dependent aggregated level random effects model

Hukum Chandra Nicola Salvati 《统计学通讯:理论与方法》2018,47(5):1234-1255

This paper describes small area estimation (SAE) of proportions under a spatial dependent generalized linear mixed model using aggregated level data. The SAE is also applied to produce reliable district level estimates and mapping of incidence of indebtedness in the State of Uttar Pradesh in India using debt and investment survey data collected by National Sample Survey Office (NSSO) and the secondary data from the Census. The results show a significant improvement in precision of model-based estimates generated by SAE as compared to direct estimates. The estimates generated by incorporating spatial information are more efficient than the one generated by ignoring this information. 相似文献

6.

The use of power transformations in small area estimation

Getachew Asfaw Dagne 《Journal of applied statistics》2003,30(4):411-423

Sample surveys are usually designed and analysed to produce estimates for larger areas. Nevertheless, sample sizes are often not large enough to give adequate precision for small area estimates of interest. To overcome such difficulties, borrowing strength from related small areas via modelling becomes essential. In line with this, we propose components of variance models with power transformations for small area estimation. This paper reports the results of a study aimed at incorporating the power transformation in small area estimation for improving the quality of small area predictions. The proposed methods are demonstrated on satellite data in conjunction with survey data to estimate mean acreage under a specified crop for counties in Iowa. 相似文献

7.

Hierarchical Bayes estimation in small area estimation using cross-sectional and time-series data

《Journal of Statistical Computation and Simulation》2012,82(3):605-613

Bayesian methods have been extensively used in small area estimation. A linear model incorporating autocorrelated random effects and sampling errors was previously proposed in small area estimation using both cross-sectional and time-series data in the Bayesian paradigm. There are, however, many situations that we have time-related counts or proportions in small area estimation; for example, monthly dataset on the number of incidence in small areas. This article considers hierarchical Bayes generalized linear models for a unified analysis of both discrete and continuous data with incorporating cross-sectional and time-series data. The performance of the proposed approach is evaluated through several simulation studies and also by a real dataset. 相似文献

8.

Penalized calibration in survey sampling: Design-based estimation assisted by mixed models

Fabien Guggemos Yves Tillé 《Journal of statistical planning and inference》2010

Calibration techniques in survey sampling, such as generalized regression estimation (GREG), were formalized in the 1990s to produce efficient estimators of linear combinations of study variables, such as totals or means. They implicitly lie on the assumption of a linear regression model between the variable of interest and some auxiliary variables in order to yield estimates with lower variance if the model is true and remaining approximately design-unbiased even if the model does not hold. We propose a new class of model-assisted estimators obtained by releasing a few calibration constraints and replacing them with a penalty term. This penalization is added to the distance criterion to minimize. By introducing the concept of penalized calibration, combining usual calibration and this ‘relaxed’ calibration, we are able to adjust the weight given to the available auxiliary information. We obtain a more flexible estimation procedure giving better estimates particularly when the auxiliary information is overly abundant or not fully appropriate to be completely used. Such an approach can also be seen as a design-based alternative to the estimation procedures based on the more general class of mixed models, presenting new prospects in some scopes of application such as inference on small domains. 相似文献

9.

Variance estimation when donor imputation is used to fill in missing values

Jean‐François Beaumont Cynthia Bocci 《Revue canadienne de statistique》2009,37(3):400-416

Donor imputation is frequently used in surveys. However, very few variance estimation methods that take into account donor imputation have been developed in the literature. This is particularly true for surveys with high sampling fractions using nearest donor imputation, often called nearest‐neighbour imputation. In this paper, the authors develop a variance estimator for donor imputation based on the assumption that the imputed estimator of a domain total is approximately unbiased under an imputation model; that is, a model for the variable requiring imputation. Their variance estimator is valid, irrespective of the magnitude of the sampling fractions and the complexity of the donor imputation method, provided that the imputation model mean and variance are accurately estimated. They evaluate its performance in a simulation study and show that nonparametric estimation of the model mean and variance via smoothing splines brings robustness with respect to imputation model misspecifications. They also apply their variance estimator to real survey data when nearest‐neighbour imputation has been used to fill in the missing values. The Canadian Journal of Statistics 37: 400–416; 2009 © 2009 Statistical Society of Canada 相似文献

10.

Noninformative nonparametric quantile estimation for simple random samples

《Journal of statistical planning and inference》2006,136(1):53-67

For noninformative nonparametric estimation of finite population quantiles under simple random sampling, estimation based on the Polya posterior is similar to estimation based on the Bayesian approach developed by Ericson (J. Roy. Statist. Soc. Ser. B 31 (1969) 195) in that the Polya posterior distribution is the limit of Ericson's posterior distributions as the weight placed on the prior distribution diminishes. Furthermore, Polya posterior quantile estimates can be shown to be admissible under certain conditions. We demonstrate the admissibility of the sample median as an estimate of the population median under such a set of conditions. As with Ericson's Bayesian approach, Polya posterior-based interval estimates for population quantiles are asymptotically equivalent to the interval estimates obtained from standard frequentist approaches. In addition, for small to moderate sized populations, Polya posterior-based interval estimates for quantiles of a continuous characteristic of interest tend to agree with the standard frequentist interval estimates. 相似文献

11.

Semi‐parametric small‐area estimation by combining time‐series and cross‐sectional data methods

下载免费PDF全文

Farhad Shokoohi Mahmoud Torabi 《Australian & New Zealand Journal of Statistics》2018,60(3):323-342

In survey sampling, policymaking regarding the allocation of resources to subgroups (called small areas) or the determination of subgroups with specific properties in a population should be based on reliable estimates. Information, however, is often collected at a different scale than that of these subgroups; hence, the estimation can only be obtained on finer scale data. Parametric mixed models are commonly used in small‐area estimation. The relationship between predictors and response, however, may not be linear in some real situations. Recently, small‐area estimation using a generalised linear mixed model (GLMM) with a penalised spline (P‐spline) regression model, for the fixed part of the model, has been proposed to analyse cross‐sectional responses, both normal and non‐normal. However, there are many situations in which the responses in small areas are serially dependent over time. Such a situation is exemplified by a data set on the annual number of visits to physicians by patients seeking treatment for asthma, in different areas of Manitoba, Canada. In cases where covariates that can possibly predict physician visits by asthma patients (e.g. age and genetic and environmental factors) may not have a linear relationship with the response, new models for analysing such data sets are required. In the current work, using both time‐series and cross‐sectional data methods, we propose P‐spline regression models for small‐area estimation under GLMMs. Our proposed model covers both normal and non‐normal responses. In particular, the empirical best predictors of small‐area parameters and their corresponding prediction intervals are studied with the maximum likelihood estimation approach being used to estimate the model parameters. The performance of the proposed approach is evaluated using some simulations and also by analysing two real data sets (precipitation and asthma). 相似文献

12.

The Bernstein–von Mises theorem in semiparametric competing risks models

Pierpaolo De Blasi Nils Lid Hjort 《Journal of statistical planning and inference》2009

Semiparametric Bayesian models are nowadays a popular tool in event history analysis. An important area of research concerns the investigation of frequentist properties of posterior inference. In this paper, we propose novel semiparametric Bayesian models for the analysis of competing risks data and investigate the Bernstein–von Mises theorem for differentiable functionals of model parameters. The model is specified by expressing the cause-specific hazard as the product of the conditional probability of a failure type and the overall hazard rate. We take the conditional probability as a smooth function of time and leave the cumulative overall hazard unspecified. A prior distribution is defined on the joint parameter space, which includes a beta process prior for the cumulative overall hazard. We first develop the large-sample properties of maximum likelihood estimators by giving simple sufficient conditions for them to hold. Then, we show that, under the chosen priors, the posterior distribution for any differentiable functional of interest is asymptotically equivalent to the sampling distribution derived from maximum likelihood estimation. A simulation study is provided to illustrate the coverage properties of credible intervals on cumulative incidence functions. 相似文献

13.

M-quantile models with application to poverty mapping 总被引：1，自引：0，他引：1

Nikos Tzavidis Nicola Salvati Monica Pratesi Ray Chambers 《Statistical Methods and Applications》2008,17(3):393-411

Over the last decade there has been growing demand for estimates of population characteristics at small area level. Unfortunately, cost constraints in the design of sample surveys lead to small sample sizes within these areas and as a result direct estimation, using only the survey data, is inappropriate since it yields estimates with unacceptable levels of precision. Small area models are designed to tackle the small sample size problem. The most popular class of models for small area estimation is random effects models that include random area effects to account for between area variations. However, such models also depend on strong distributional assumptions, require a formal specification of the random part of the model and do not easily allow for outlier robust inference. An alternative approach to small area estimation that is based on the use of M-quantile models was recently proposed by Chambers and Tzavidis (Biometrika 93(2):255–268, 2006) and Tzavidis and Chambers (Robust prediction of small area means and distributions. Working paper, 2007). Unlike traditional random effects models, M-quantile models do not depend on strong distributional assumption and automatically provide outlier robust inference. In this paper we illustrate for the first time how M-quantile models can be practically employed for deriving small area estimates of poverty and inequality. The methodology we propose improves the traditional poverty mapping methods in the following ways: (a) it enables the estimation of the distribution function of the study variable within the small area of interest both under an M-quantile and a random effects model, (b) it provides analytical, instead of empirical, estimation of the mean squared error of the M-quantile small area mean estimates and (c) it employs a robust to outliers estimation method. The methodology is applied to data from the 2002 Living Standards Measurement Survey (LSMS) in Albania for estimating (a) district level estimates of the incidence of poverty in Albania, (b) district level inequality measures and (c) the distribution function of household per-capita consumption expenditure in each district. Small area estimates of poverty and inequality show that the poorest Albanian districts are in the mountainous regions (north and north east) with the wealthiest districts, which are also linked with high levels of inequality, in the coastal (south west) and southern part of country. We discuss the practical advantages of our methodology and note the consistency of our results with results from previous studies. We further demonstrate the usefulness of the M-quantile estimation framework through design-based simulations based on two realistic survey data sets containing small area information and show that the M-quantile approach may be preferable when the aim is to estimate the small area distribution function. 相似文献

14.

Exploring spatial dependence in area-level random effect model for disaggregate-level crop yield estimation

Hukum Chandra 《Journal of applied statistics》2013,40(4):823-842

This paper describes an application of small area estimation (SAE) techniques under area-level spatial random effect models when only area (or district or aggregated) level data are available. In particular, the SAE approach is applied to produce district-level model-based estimates of crop yield for paddy in the state of Uttar Pradesh in India using the data on crop-cutting experiments supervised under the Improvement of Crop Statistics scheme and the secondary data from the Population Census. The diagnostic measures are illustrated to examine the model assumptions as well as reliability and validity of the generated model-based small area estimates. The results show a considerable gain in precision in model-based estimates produced applying SAE. Furthermore, the model-based estimates obtained by exploiting spatial information are more efficient than the one obtained by ignoring this information. However, both of these model-based estimates are more efficient than the direct survey estimate. In many districts, there is no survey data and therefore it is not possible to produce direct survey estimates for these districts. The model-based estimates generated using SAE are still reliable for such districts. These estimates produced by using SAE will provide invaluable information to policy-analysts and decision-makers. 相似文献

15.

Variance estimation for a low income proportion 总被引：1，自引：0，他引：1

Yves G. Berger Chris J. Skinner 《Journal of the Royal Statistical Society. Series C, Applied statistics》2003,52(4):457-468

Summary. Proportions below a given fraction of a quantile of an income distribution are often estimated from survey data in comparisons of poverty. We consider the estimation of the variance of such a proportion, estimated from Family Expenditure Survey data. We show how a linearization method of variance estimation may be applied to this proportion, allowing for the effects of both a complex sampling design and weighting by a raking method to population controls. We show that, for data for 1998–1999, the estimated variances are always increased when allowance is made for the design and raking weights, the principal effect arising from the design. We also study the properties of a simplified variance estimator and discuss extensions to a wider class of poverty measures. 相似文献

16.

Efficient estimation of variance components in nonparametric mixed-effects models with large samples

Nathaniel E. Helwig 《Statistics and Computing》2016,26(6):1319-1336

Linear mixed-effects (LME) regression models are a popular approach for analyzing correlated data. Nonparametric extensions of the LME regression model have been proposed, but the heavy computational cost makes these extensions impractical for analyzing large samples. In particular, simultaneous estimation of the variance components and smoothing parameters poses a computational challenge when working with large samples. To overcome this computational burden, we propose a two-stage estimation procedure for fitting nonparametric mixed-effects regression models. Our results reveal that, compared to currently popular approaches, our two-stage approach produces more accurate estimates that can be computed in a fraction of the time. 相似文献

17.

Robust estimation of variance components

Daniel Gervini Victor J. Yohai 《Revue canadienne de statistique》1998,26(3):419-430

New robust estimates for variance components are introduced. Two simple models are considered: the balanced one-way classification model with a random factor and the balanced mixed model with one random factor and one fixed factor. However, the method of estimation proposed can be extended to more complex models. The new method of estimation we propose is based on the relationship between the variance components and the coefficients of the least-mean-squared-error predictor between two observations of the same group. This relationship enables us to transform the problem of estimating the variance components into the problem of estimating the coefficients of a simple linear regression model. The variance-component estimators derived from the least-squares regression estimates are shown to coincide with the maximum-likelihood estimates. Robust estimates of the variance components can be obtained by replacing the least-squares estimates by robust regression estimates. In particular, a Monte Carlo study shows that for outlier-contaminated normal samples, the estimates of variance components derived from GM regression estimates and the derived test outperform other robust procedures. 相似文献

18.

Generalised Variance Function Estimation for Binary Variables in Large‐Scale Sample Surveys

Ricardo Cao José A. Vilar Juan M. Vilar 《Australian & New Zealand Journal of Statistics》2012,54(3):301-324

Generalised variance function (GVF) models are data analysis techniques often used in large‐scale sample surveys to approximate the design variance of point estimators for population means and proportions. Some potential advantages of the GVF approach include operational simplicity, more stable sampling errors estimates and providing a convenient method of summarising results when a high number of survey variables is considered. In this paper, several parametric and nonparametric methods for GVF estimation with binary variables are proposed and compared. The behavior of these estimators is analysed under heteroscedasticity and in the presence of outliers and influential observations. An empirical study based on the annual survey of living conditions in Galicia (a region in the northwest of Spain) illustrates the behaviour of the proposed estimators. 相似文献

19.

An elastic net penalized small area model combining unit- and area-level data for regional hypertension prevalence estimation

J. P. Burgard J. Krause R. Münnich 《Journal of applied statistics》2021,48(9):1659

Hypertension is a highly prevalent cardiovascular disease. It marks a considerable cost factor to many national health systems. Despite its prevalence, regional disease distributions are often unknown and must be estimated from survey data. However, health surveys frequently lack in regional observations due to limited resources. Obtained prevalence estimates suffer from unacceptably large sampling variances and are not reliable. Small area estimation solves this problem by linking auxiliary data from multiple regions in suitable regression models. Typically, either unit- or area-level observations are considered for this purpose. But with respect to hypertension, both levels should be used. Hypertension has characteristic comorbidities and is strongly related to lifestyle features, which are unit-level information. It is also correlated with socioeconomic indicators that are usually measured on the area-level. But the level combination is challenging as it requires multi-level model parameter estimation from small samples. We use a multi-level small area model with level-specific penalization to overcome this issue. Model parameter estimation is performed via stochastic coordinate gradient descent. A jackknife estimator of the mean squared error is presented. The methodology is applied to combine health survey data and administrative records to estimate regional hypertension prevalence in Germany. 相似文献

20.

Bayesian estimation and prediction based on lognormal record values

Sukhdev Singh Yogesh Mani Tripathi 《Journal of applied statistics》2017,44(5):916-940

In this paper we consider the problems of estimation and prediction when observed data from a lognormal distribution are based on lower record values and lower record values with inter-record times. We compute maximum likelihood estimates and asymptotic confidence intervals for model parameters. We also obtain Bayes estimates and the highest posterior density (HPD) intervals using noninformative and informative priors under square error and LINEX loss functions. Furthermore, for the problem of Bayesian prediction under one-sample and two-sample framework, we obtain predictive estimates and the associated predictive equal-tail and HPD intervals. Finally for illustration purpose a real data set is analyzed and simulation study is conducted to compare the methods of estimation and prediction. 相似文献