首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
M-quantile models with application to poverty mapping   总被引:1,自引:0,他引:1  
Over the last decade there has been growing demand for estimates of population characteristics at small area level. Unfortunately, cost constraints in the design of sample surveys lead to small sample sizes within these areas and as a result direct estimation, using only the survey data, is inappropriate since it yields estimates with unacceptable levels of precision. Small area models are designed to tackle the small sample size problem. The most popular class of models for small area estimation is random effects models that include random area effects to account for between area variations. However, such models also depend on strong distributional assumptions, require a formal specification of the random part of the model and do not easily allow for outlier robust inference. An alternative approach to small area estimation that is based on the use of M-quantile models was recently proposed by Chambers and Tzavidis (Biometrika 93(2):255–268, 2006) and Tzavidis and Chambers (Robust prediction of small area means and distributions. Working paper, 2007). Unlike traditional random effects models, M-quantile models do not depend on strong distributional assumption and automatically provide outlier robust inference. In this paper we illustrate for the first time how M-quantile models can be practically employed for deriving small area estimates of poverty and inequality. The methodology we propose improves the traditional poverty mapping methods in the following ways: (a) it enables the estimation of the distribution function of the study variable within the small area of interest both under an M-quantile and a random effects model, (b) it provides analytical, instead of empirical, estimation of the mean squared error of the M-quantile small area mean estimates and (c) it employs a robust to outliers estimation method. The methodology is applied to data from the 2002 Living Standards Measurement Survey (LSMS) in Albania for estimating (a) district level estimates of the incidence of poverty in Albania, (b) district level inequality measures and (c) the distribution function of household per-capita consumption expenditure in each district. Small area estimates of poverty and inequality show that the poorest Albanian districts are in the mountainous regions (north and north east) with the wealthiest districts, which are also linked with high levels of inequality, in the coastal (south west) and southern part of country. We discuss the practical advantages of our methodology and note the consistency of our results with results from previous studies. We further demonstrate the usefulness of the M-quantile estimation framework through design-based simulations based on two realistic survey data sets containing small area information and show that the M-quantile approach may be preferable when the aim is to estimate the small area distribution function.  相似文献   

This paper describes an application of small area estimation (SAE) techniques under area-level spatial random effect models when only area (or district or aggregated) level data are available. In particular, the SAE approach is applied to produce district-level model-based estimates of crop yield for paddy in the state of Uttar Pradesh in India using the data on crop-cutting experiments supervised under the Improvement of Crop Statistics scheme and the secondary data from the Population Census. The diagnostic measures are illustrated to examine the model assumptions as well as reliability and validity of the generated model-based small area estimates. The results show a considerable gain in precision in model-based estimates produced applying SAE. Furthermore, the model-based estimates obtained by exploiting spatial information are more efficient than the one obtained by ignoring this information. However, both of these model-based estimates are more efficient than the direct survey estimate. In many districts, there is no survey data and therefore it is not possible to produce direct survey estimates for these districts. The model-based estimates generated using SAE are still reliable for such districts. These estimates produced by using SAE will provide invaluable information to policy-analysts and decision-makers.  相似文献   

The first step in statistical analysis is the parameter estimation. In multivariate analysis, one of the parameters of interest to be estimated is the mean vector. In multivariate statistical analysis, it is usually assumed that the data come from a multivariate normal distribution. In this situation, the maximum likelihood estimator (MLE), that is, the sample mean vector, is the best estimator. However, when outliers exist in the data, the use of sample mean vector will result in poor estimation. So, other estimators which are robust to the existence of outliers should be used. The most popular robust multivariate estimator for estimating the mean vector is S-estimator with desirable properties. However, computing this estimator requires the use of a robust estimate of mean vector as a starting point. Usually minimum volume ellipsoid (MVE) is used as a starting point in computing S-estimator. For high-dimensional data computing, the MVE takes too much time. In some cases, this time is so large that the existing computers cannot perform the computation. In addition to the computation time, for high-dimensional data set the MVE method is not precise. In this paper, a robust starting point for S-estimator based on robust clustering is proposed which could be used for estimating the mean vector of the high-dimensional data. The performance of the proposed estimator in the presence of outliers is studied and the results indicate that the proposed estimator performs precisely and much better than some of the existing robust estimators for high-dimensional data.  相似文献   

This paper presents a method of estimation of crop-production statistics at smaller geographical levels like a community development block (generally referred to as a block) to make area-specific plans for agricultural development programmes in India. Using available district-level data on crop yield from crop-cutting experiments and data on auxiliary variables from various administrative sources, a suitable regression model is fitted. The fitted model is then used to predict the crop production at the block level. Some scaled estimators are also developed using predicted estimates. An empirical study is also carried out to judge the merits of the proposed estimators.  相似文献   

This paper develops a novel weighted composite quantile regression (CQR) method for estimation of a linear model when some covariates are missing at random and the probability for missingness mechanism can be modelled parametrically. By incorporating the unbiased estimating equations of incomplete data into empirical likelihood (EL), we obtain the EL-based weights, and then re-adjust the inverse probability weighted CQR for estimating the vector of regression coefficients. Theoretical results show that the proposed method can achieve semiparametric efficiency if the selection probability function is correctly specified, therefore the EL weighted CQR is more efficient than the inverse probability weighted CQR. Besides, our algorithm is computationally simple and easy to implement. Simulation studies are conducted to examine the finite sample performance of the proposed procedures. Finally, we apply the new method to analyse the US news College data.  相似文献   

In testing product reliability, there is often a critical cutoff level that determines whether a specimen is classified as failed. One consequence is that the number of degradation data collected varies from specimen to specimen. The information of random sample size should be included in the model, and our study shows that it can be influential in estimating model parameters. Two-stage least squares (LS) and maximum modified likelihood (MML) estimation, which both assume fixed sample sizes, are commonly used for estimating parameters in the repeated measurements models typically applied to degradation data. However, the LS estimate is not consistent in the case of random sample sizes. This article derives the likelihood for the random sample size model and suggests using maximum likelihood (ML) for parameter estimation. Our simulation studies show that ML estimates have smaller biases and variances compared to the LS and MML estimates. All estimation methods can be greatly improved if the number of specimens increases from 5 to 10. A data set from a semiconductor application is used to illustrate our methods.  相似文献   

An extended single‐index model is considered when responses are missing at random. A three‐step estimation procedure is developed to define an estimator for the single‐index parameter vector by a joint estimating equation. The proposed estimator is shown to be asymptotically normal. An algorithm for computing this estimator is proposed. This algorithm only involves one‐dimensional nonparametric smoothers, thereby avoiding the data sparsity problem caused by high model dimensionality. Some simulation studies are conducted to investigate the finite sample performances of the proposed estimators.  相似文献   

周巍等 《统计研究》2015,32(7):81-86
遥感影像是大数据的一种,利用遥感对农作物播种面积进行估算常采用回归估计量或校准估计量,通常都需要将地面样本数据与遥感分类信息相结合。但对于大多数回归估计量,对省级总体的农作物面积估算只能满足对省级总体的精度要求而不能分解到更小区域,比如县和乡级。本文利用黑龙江省2011年的地面实测样本数据结合遥感分类结果,构建了单元层次的多响应变量的多元回归形式的小域模型,并将小域效应设定为固定形式。这样基于回归估计方法,既可以估算分县的主要作物播种面积,也可以使得各县播种面积估计结果相加就等于回归模型含义下的省级总体的总量估计。对黑龙江省玉米、水稻、大豆分县小域估计结果的精度评价(变异系数C.V),平均而言均可以满足县级精度要求。本文的结果表明小域估计方法在解决省级总体对全省和分县的农作物种植面积多级估算问题中具有很好的应用。  相似文献   

This paper extends the univariate time series smoothing approach provided by penalized least squares to a multivariate setting, thus allowing for joint estimation of several time series trends. The theoretical results are valid for the general multivariate case, but particular emphasis is placed on the bivariate situation from an applied point of view. The proposal is based on a vector signal-plus-noise representation of the observed data that requires the first two sample moments and specifying only one smoothing constant. A measure of the amount of smoothness of an estimated trend is introduced so that an analyst can set in advance a desired percentage of smoothness to be achieved by the trend estimate. The required smoothing constant is determined by the chosen percentage of smoothness. Closed form expressions for the smoothed estimated vector and its variance-covariance matrix are derived from a straightforward application of generalized least squares, thus providing best linear unbiased estimates for the trends. A detailed algorithm applicable for estimating bivariate time series trends is also presented and justified. The theoretical results are supported by a simulation study and two real applications. One corresponds to Mexican and US macroeconomic data within the context of business cycle analysis, and the other one to environmental data pertaining to a monitored site in Scotland.  相似文献   

The European Union Statistics on Income and Living Conditions (EU-SILC) is the main source of information about poverty and economic inequality in the member states of the European Union. The sample sizes of its annual national surveys are sufficient for reliable estimation at the national level but not for inferences at the sub-national level, failing to respond to a rising demand from policy-makers and local authorities. We provide a comprehensive map of median income, inequality (Gini coefficient and Lorenz curve) and poverty (poverty rates) based on the equivalised household income in the countries in which the EU-SILC is conducted. We study the distribution of income of households (pro-rated to its members), not merely its median (or mean), because we regard its dispersion and frequency of lower extremes (relative poverty) as important characteristics. The estimation for the regions with small sample sizes is improved by the small-area methods. The uncertainty of complex nonlinear statistics is assessed by bootstrap. Household-level sampling weights are taken into account in both the estimates and the associated bootstrap standard errors.  相似文献   

The problem of estimating an unknown change-point in the mean vector or covariance matrix of a sequence of independent multivariate Gaussian random variables is considered. Adapting the estimation methodology that Hinkley pursued for the case of abrupt changes, we develop theory for deriving the asymptotic distribution of the maximum likelihood estimator of the change-point when the amount of change is a function of the sample size and goes to zero in a smooth fashion as the sample size goes to infinity, yielding a contiguous change-point model. Simulations have been performed to illustrate the closeness of the asymptotic distribution with the empirical distribution, and to evaluate its robustness to departures from normality for reasonable sample sizes as well as parameter changes. Finally, we apply the methodology to estimate the change-point in the daily log-returns data of BLS (BellSouth) and VZ (Verizon) from NYSE.  相似文献   

Many large-scale sample surveys use panel designs under which sampled individuals are interviewed several times before being dropped from the sample. The longitudinal data bases available from such surveys could be used to provide estimates of gross change over time. One problem in using these data to estimate gross change is how to handle the period-to-period nonresponse. This nonresponse is typically nonrandom and, furthermore, may be nonignorable in that it cannot be accounted for by other observed quantities in the data. Under the models proposed in this article, which are appropriate for the analysis of categorical data, the probability of nonresponse may be taken to be a function of the missing variable of interest. The proposed models are fit using maximum likelihood estimation. As an example, the method is applied to the problem of estimating gross flows in labor-force participation using data from the Current Population Survey and the Canadian Labour Force Survey.  相似文献   

Small area estimation has received considerable attention in recent years because of growing demand for small area statistics. Basic area‐level and unit‐level models have been studied in the literature to obtain empirical best linear unbiased prediction (EBLUP) estimators of small area means. Although this classical method is useful for estimating the small area means efficiently under normality assumptions, it can be highly influenced by the presence of outliers in the data. In this article, the authors investigate the robustness properties of the classical estimators and propose a resistant method for small area estimation, which is useful for downweighting any influential observations in the data when estimating the model parameters. To estimate the mean squared errors of the robust estimators of small area means, a parametric bootstrap method is adopted here, which is applicable to models with block diagonal covariance structures. Simulations are carried out to study the behaviour of the proposed robust estimators in the presence of outliers, and these estimators are also compared to the EBLUP estimators. Performance of the bootstrap mean squared error estimator is also investigated in the simulation study. The proposed robust method is also applied to some real data to estimate crop areas for counties in Iowa, using farm‐interview data on crop areas and LANDSAT satellite data as auxiliary information. The Canadian Journal of Statistics 37: 381–399; 2009 © 2009 Statistical Society of Canada  相似文献   

In this paper, a method for estimating monotone, convex and log-concave densities is proposed. The estimation procedure consists of an unconstrained kernel estimator which is modified in a second step with respect to the desired shape constraint by using monotone rearrangements. It is shown that the resulting estimate is a density itself and shares the asymptotic properties of the unconstrained estimate. A short simulation study shows the finite sample behavior.  相似文献   

A systematic procedure for the derivation of linearized variables for the estimation of sampling errors of complex nonlinear statistics involved in the analysis of poverty and income inequality is developed. The linearized variable extends the use of standard variance estimation formulae, developed for linear statistics such as sample aggregates, to nonlinear statistics. The context is that of cross-sectional samples of complex design and reasonably large size, as typically used in population-based surveys. Results of application of the procedure to a wide range of poverty and inequality measures are presented. A standardized software for the purpose has been developed and can be provided to interested users on request. Procedures are provided for the estimation of the design effect and its decomposition into the contribution of unequal sample weights and of other design complexities such as clustering and stratification. The consequence of treating a complex statistic as a simple ratio in estimating its sampling error is also quantified. The second theme of the paper is to compare the linearization approach with an alternative approach based on the concept of replication, namely the Jackknife repeated replication (JRR) method. The basis and application of the JRR method is described, the exposition paralleling that of the linearization method but in somewhat less detail. Based on data from an actual national survey, estimates of standard errors and design effects from the two methods are analysed and compared. The numerical results confirm that the two alternative approaches generally give very similar results, though notable differences can exist for certain statistics. Relative advantages and limitations of the approaches are identified.  相似文献   

Estimation of a general multi-index model comprises determining the number of linear combinations of predictors (structural dimension) that are related to the response, estimating the loadings of each index vector, selecting the active predictors and estimating the underlying link function. These objectives are often achieved sequentially at different stages of the estimation process. In this study, we propose a unified estimation approach under a semi-parametric model framework to attain these estimation goals simultaneously. The proposed estimation method is more efficient and stable than many existing methods where the estimation error in the structural dimension may propagate to the estimation of the index vectors and variable selection stages. A detailed algorithm is provided to implement the proposed method. Comprehensive simulations and a real data analysis illustrate the effectiveness of the proposed method.  相似文献   

This paper proposes a semi-parametric modelling and estimating method for analysing censored survival data. The proposed method uses the empirical likelihood function to describe the information in data, and formulates estimating equations to incorporate knowledge of the underlying distribution and regression structure. The method is more flexible than the traditional methods such as the parametric maximum likelihood estimation (MLE), Cox's (1972) proportional hazards model, accelerated life test model, quasi-likelihood (Wedderburn, 1974) and generalized estimating equations (Liang & Zeger, 1986). This paper shows the existence and uniqueness of the proposed semi-parametric maximum likelihood estimates (SMLE) with estimating equations. The method is validated with known cases studied in the literature. Several finite sample simulation and large sample efficiency studies indicate that when the sample size is larger than 100 the SMLE is compatible with the parametric MLE; and in all case studies, the SMLE is about 15% better than the parametric MLE with a mis-specified underlying distribution.  相似文献   

The traditional method for estimating or predicting linear combinations of the fixed effects and realized values of the random effects in mixed linear models is first to estimate the variance components and then to proceed as if the estimated values of the variance components were the true values. This two-stage procedure gives unbiased estimators or predictors of the linear combinations provided the data vector is symmetrically distributed about its expected value and provided the variance component estimators are translation-invariant and are even functions of the data vector. The standard procedures for estimating the variance components yield even, translation-invariant estimators.  相似文献   

This paper focuses on efficient estimation, optimal rates of convergence and effective algorithms in the partly linear additive hazards regression model with current status data. We use polynomial splines to estimate both cumulative baseline hazard function with monotonicity constraint and nonparametric regression functions with no such constraint. We propose a simultaneous sieve maximum likelihood estimation for regression parameters and nuisance parameters and show that the resultant estimator of regression parameter vector is asymptotically normal and achieves the semiparametric information bound. In addition, we show that rates of convergence for the estimators of nonparametric functions are optimal. We implement the proposed estimation through a backfitting algorithm on generalized linear models. We conduct simulation studies to examine the finite‐sample performance of the proposed estimation method and present an analysis of renal function recovery data for illustration.  相似文献   

面板数据的自适应Lasso分位回归方法研究   总被引:1,自引:0,他引:1  
如何在对参数进行估计的同时自动选择重要解释变量,一直是面板数据分位回归模型中讨论的热点问题之一。通过构造一种含多重随机效应的贝叶斯分层分位回归模型,在假定固定效应系数先验服从一种新的条件Laplace分布的基础上,给出了模型参数估计的Gibbs抽样算法。考虑到不同重要程度的解释变量权重系数压缩程度应该不同,所构造的先验信息具有自适应性的特点,能够准确地对模型中重要解释变量进行自动选取,且设计的切片Gibbs抽样算法能够快速有效地解决模型中各个参数的后验均值估计问题。模拟结果显示,新方法在参数估计精确度和变量选择准确度上均优于现有文献的常用方法。通过对中国各地区多个宏观经济指标的面板数据进行建模分析,演示了新方法估计参数与挑选变量的能力。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号