首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
M-quantile models with application to poverty mapping   总被引:1,自引:0,他引:1  
Over the last decade there has been growing demand for estimates of population characteristics at small area level. Unfortunately, cost constraints in the design of sample surveys lead to small sample sizes within these areas and as a result direct estimation, using only the survey data, is inappropriate since it yields estimates with unacceptable levels of precision. Small area models are designed to tackle the small sample size problem. The most popular class of models for small area estimation is random effects models that include random area effects to account for between area variations. However, such models also depend on strong distributional assumptions, require a formal specification of the random part of the model and do not easily allow for outlier robust inference. An alternative approach to small area estimation that is based on the use of M-quantile models was recently proposed by Chambers and Tzavidis (Biometrika 93(2):255–268, 2006) and Tzavidis and Chambers (Robust prediction of small area means and distributions. Working paper, 2007). Unlike traditional random effects models, M-quantile models do not depend on strong distributional assumption and automatically provide outlier robust inference. In this paper we illustrate for the first time how M-quantile models can be practically employed for deriving small area estimates of poverty and inequality. The methodology we propose improves the traditional poverty mapping methods in the following ways: (a) it enables the estimation of the distribution function of the study variable within the small area of interest both under an M-quantile and a random effects model, (b) it provides analytical, instead of empirical, estimation of the mean squared error of the M-quantile small area mean estimates and (c) it employs a robust to outliers estimation method. The methodology is applied to data from the 2002 Living Standards Measurement Survey (LSMS) in Albania for estimating (a) district level estimates of the incidence of poverty in Albania, (b) district level inequality measures and (c) the distribution function of household per-capita consumption expenditure in each district. Small area estimates of poverty and inequality show that the poorest Albanian districts are in the mountainous regions (north and north east) with the wealthiest districts, which are also linked with high levels of inequality, in the coastal (south west) and southern part of country. We discuss the practical advantages of our methodology and note the consistency of our results with results from previous studies. We further demonstrate the usefulness of the M-quantile estimation framework through design-based simulations based on two realistic survey data sets containing small area information and show that the M-quantile approach may be preferable when the aim is to estimate the small area distribution function.  相似文献   

2.
This paper studies the outlier detection and robust variable selection problem in the linear regression model. The penalized weighted least absolute deviation (PWLAD) regression estimation method and the adaptive least absolute shrinkage and selection operator (LASSO) are combined to simultaneously achieve outlier detection, and robust variable selection. An iterative algorithm is proposed to solve the proposed optimization problem. Monte Carlo studies are evaluated the finite-sample performance of the proposed methods. The results indicate that the finite sample performance of the proposed methods performs better than that of the existing methods when there are leverage points or outliers in the response variable or explanatory variables. Finally, we apply the proposed methodology to analyze two real datasets.  相似文献   

3.
Small area estimation has received considerable attention in recent years because of growing demand for small area statistics. Basic area‐level and unit‐level models have been studied in the literature to obtain empirical best linear unbiased prediction (EBLUP) estimators of small area means. Although this classical method is useful for estimating the small area means efficiently under normality assumptions, it can be highly influenced by the presence of outliers in the data. In this article, the authors investigate the robustness properties of the classical estimators and propose a resistant method for small area estimation, which is useful for downweighting any influential observations in the data when estimating the model parameters. To estimate the mean squared errors of the robust estimators of small area means, a parametric bootstrap method is adopted here, which is applicable to models with block diagonal covariance structures. Simulations are carried out to study the behaviour of the proposed robust estimators in the presence of outliers, and these estimators are also compared to the EBLUP estimators. Performance of the bootstrap mean squared error estimator is also investigated in the simulation study. The proposed robust method is also applied to some real data to estimate crop areas for counties in Iowa, using farm‐interview data on crop areas and LANDSAT satellite data as auxiliary information. The Canadian Journal of Statistics 37: 381–399; 2009 © 2009 Statistical Society of Canada  相似文献   

4.
In practical survey sampling, missing data are unavoidable due to nonresponse, rejected observations by editing, disclosure control, or outlier suppression. We propose a calibrated imputation approach so that valid point and variance estimates of the population (or domain) totals can be computed by the secondary users using simple complete‐sample formulae. This is especially helpful for variance estimation, which generally require additional information and tools that are unavailable to the secondary users. Our approach is natural for continuous variables, where the estimation may be either based on reweighting or imputation, including possibly their outlier‐robust extensions. We also propose a multivariate procedure to accommodate the estimation of the covariance matrix between estimated population totals, which facilitates variance estimation of the ratios or differences among the estimated totals. We illustrate the proposed approach using simulation data in supplementary materials that are available online.  相似文献   

5.
In this paper, we consider the problem of estimating the number of components of a superimposed nonlinear sinusoids model of a signal in the presence of additive noise. We propose and provide a detailed empirical comparison of robust methods for estimation of the number of components. The proposed methods, which are robust modifications of the commonly used information theoretic criteria, are based on various M-estimator approaches and are robust with respect to outliers present in the data and heavy-tailed noise. The proposed methods are compared with the usual non-robust methods through extensive simulations under varied model scenarios. We also present real signal analysis of two speech signals to show the usefulness of the proposed methodology.  相似文献   

6.
This paper studies outlier detection and accommodation in general spatial models including spatial autoregressive models and spatial error model as special cases. Using mean-shift and variance-weight models respectively, test statistics for multiple outliers are derived and the detecting procedures are proposed. In addition, several key diagnostic measures such as standardized residuals and leverage measure are defined in general spatial models. Outlier modified models are proposed to accommodate outliers in the data set. The performance of test statistics, including size and power, are examined via simulation studies. Three real examples are analyzed and the results show that the proposed methodology is useful for identifying and accommodating outliers in general spatial models.  相似文献   

7.
Small area estimation techniques are becoming increasingly used in survey applications to provide estimates for local areas of interest. The objective of this article is to develop and apply Information Theoretic (IT)-based formulations to estimate small area business and trade statistics. More specifically, we propose a Generalized Maximum Entropy (GME) approach to the problem of small area estimation that exploits auxiliary information relating to other known variables on the population and adjusts for consistency and additivity. The GME formulations, combining information from the sample together with out-of-sample aggregates of the population of interest, can be particularly useful in the context of small area estimation, for both direct and model-based estimators, since they do not require strong distributional assumptions on the disturbances. The performance of the proposed IT formulations is illustrated through real and simulated datasets.  相似文献   

8.
Small‐area estimation techniques have typically relied on plug‐in estimation based on models containing random area effects. More recently, regression M‐quantiles have been suggested for this purpose, thus avoiding conventional Gaussian assumptions, as well as problems associated with the specification of random effects. However, the plug‐in M‐quantile estimator for the small‐area mean can be shown to be the expected value of this mean with respect to a generally biased estimator of the small‐area cumulative distribution function of the characteristic of interest. To correct this problem, we propose a general framework for robust small‐area estimation, based on representing a small‐area estimator as a functional of a predictor of this small‐area cumulative distribution function. Key advantages of this framework are that it naturally leads to integrated estimation of small‐area means and quantiles and is not restricted to M‐quantile models. We also discuss mean squared error estimation for the resulting estimators, and demonstrate the advantages of our approach through model‐based and design‐based simulations, with the latter using economic data collected in an Australian farm survey.  相似文献   

9.
There is currently much discussion about lasso-type regularized regression which is a useful tool for simultaneous estimation and variable selection. Although the lasso-type regularization has several advantages in regression modelling, owing to its sparsity, it suffers from outliers because of using penalized least-squares methods. To overcome this issue, we propose a robust lasso-type estimation procedure that uses the robust criteria as the loss function, imposing L1-type penalty called the elastic net. We also introduce to use the efficient bootstrap information criteria for choosing optimal regularization parameters and a constant in outlier detection. Simulation studies and real data analysis are given to examine the efficiency of the proposed robust sparse regression modelling. We observe that our modelling strategy performs well in the presence of outliers.  相似文献   

10.
The Zero-inflated Poisson distribution has been used in the modeling of count data in different contexts. This model tends to be influenced by outliers because of the excessive occurrence of zeroes, thus outlier identification and robust parameter estimation are important for such distribution. Some outlier identification methods are studied in this paper, and their applications and results are also presented with an example. To eliminate the effect of outliers, two robust parameter estimates are proposed based on the trimmed mean and the Winsorized mean. Simulation results show the robustness of our proposed parameter estimates.  相似文献   

11.
We develop a new methodology for determining the location and dynamics of brain activity from combined magnetoencephalography (MEG) and electroencephalography (EEG) data. The resulting inverse problem is ill‐posed and is one of the most difficult problems in neuroimaging data analysis. In our development we propose a solution that combines the data from three different modalities, magnetic resonance imaging (MRI), MEG and EEG, together. We propose a new Bayesian spatial finite mixture model that builds on the mesostate‐space model developed by Daunizeau & Friston [Daunizeau and Friston, NeuroImage 2007; 38, 67–81]. Our new model incorporates two major extensions: (i) We combine EEG and MEG data together and formulate a joint model for dealing with the two modalities simultaneously; (ii) we incorporate the Potts model to represent the spatial dependence in an allocation process that partitions the cortical surface into a small number of latent states termed mesostates. The cortical surface is obtained from MRI. We formulate the new spatiotemporal model and derive an efficient procedure for simultaneous point estimation and model selection based on the iterated conditional modes algorithm combined with local polynomial smoothing. The proposed method results in a novel estimator for the number of mixture components and is able to select active brain regions, which correspond to active variables in a high‐dimensional dynamic linear model. The methodology is investigated using synthetic data and simulation studies and then demonstrated on an application examining the neural response to the perception of scrambled faces. R software implementing the methodology along with several sample datasets are available at the following GitHub repository https://github.com/v2south/PottsMix . The Canadian Journal of Statistics 47: 688–711; 2019 © 2019 Statistical Society of Canada  相似文献   

12.
M. C. Pardo 《Statistics》2013,47(5):1071-1091
In this paper, we focus on repeated measurement problems, comprising an interesting research area in statistics. We study longitudinal data which arise when outcomes are observed repeatedly on each experimental subject at several points. We focus on a marginal approach for this type of data with lack of independence among the observations proposed by Dale [Global cross-ratio models for bivariate, discrete, ordered responses. Biometrics. 1986;42(4):909–917] for bivariate, discrete, ordered responses. We propose an alternative estimation based on divergence measures to the full likelihood method proposed in that paper. Finally, a wide simulation study and a data example that illustrates the new methodology is provided.  相似文献   

13.
A cluster methodology, motivated by a robust similarity matrix is proposed for identifying likely multivariate outlier structure and to estimate weighted least-square (WLS) regression parameters in linear models. The proposed method is an agglomeration of procedures that begins from clustering the n-observations through a test of ‘no-outlier hypothesis’ (TONH) to a weighted least-square regression estimation. The cluster phase partition the n-observations into h-set called main cluster and a minor cluster of size n?h. A robust distance emerge from the main cluster upon which a test of no outlier hypothesis’ is conducted. An initial WLS regression estimation is computed from the robust distance obtained from the main cluster. Until convergence, a re-weighted least-squares (RLS) regression estimate is updated with weights based on the normalized residuals. The proposed procedure blends an agglomerative hierarchical cluster analysis of a complete linkage through the TONH to the Re-weighted regression estimation phase. Hence, we propose to call it cluster-based re-weighted regression (CBRR). The CBRR is compared with three existing procedures using two data sets known to exhibit masking and swamping. The performance of CBRR is further examined through simulation experiment. The results obtained from the data set illustration and the Monte Carlo study shows that the CBRR is effective in detecting multivariate outliers where other methods are susceptible to it. The CBRR does not require enormous computation and is substantially not susceptible to masking and swamping.  相似文献   

14.
Spatial robust small area estimation   总被引:1,自引:0,他引:1  
The accuracy of recent applications in small area statistics in many cases highly depends on the assumed properties of the underlying models and the availability of micro information. In finite population sampling, small sample sizes may increase the sensitivity of the modeling with respect to single units. In these cases, area-specific sample sizes tend to be small such that normal assumptions, even of area means, seem to be violated. Hence, applying robust estimation methods is expected to yield more reliable results. In general, two robust small area methods are applied, the robust EBLUP and the M-quantile method. Additionally, the use of adequate auxiliary information may further increase the accuracy of the estimates. In prediction based approaches where information is needed on universe level, in general, only few variables are available which can be used for modeling. In addition to variables from the dataset, in many cases further information may be available, e.g. geographical information which could indicate spatial dependencies between neighboring areas. This spatial information can be included in the modeling using spatially correlated area effects. Within the paper the classical robust EBLUP is extended to cover spatial area effects via a simultaneous autoregressive model. The performance of the different estimators are compared in a model-based simulation study.  相似文献   

15.
Regression analysis is one of methods widely used in prediction problems. Although there are many methods used for parameter estimation in regression analysis, ordinary least squares (OLS) technique is the most commonly used one among them. However, this technique is highly sensitive to outlier observation. Therefore, in literature, robust techniques are suggested when data set includes outlier observation. Besides, in prediction a problem, using the techniques that reduce the effectiveness of outlier and using the median as a target function rather than an error mean will be more successful in modeling these kinds of data. In this study, a new parameter estimation method using the median of absolute rate obtained by division of the difference between observation values and predicted values by the observation value and based on particle swarm optimization was proposed. The performance of the proposed method was evaluated with a simulation study by comparing it with OLS and some other robust methods in the literature.  相似文献   

16.
In this paper, we consider the estimation and inference of the parameters and the nonparametric part in partially linear quantile regression models with responses that are missing at random. First, we extend the normal approximation (NA)-based methods of Sun (2005) to the missing data case. However, the asymptotic covariance matrices of NA-based methods are difficult to estimate, which complicates inference. To overcome this problem, alternatively, we propose the smoothed empirical likelihood (SEL)-based methods. We define SEL statistics for the parameters and the nonparametric part and demonstrate that the limiting distributions of the statistics are Chi-squared distributions. Accordingly, confidence regions can be obtained without the estimation of the asymptotic covariance matrices. Monte Carlo simulations are conducted to evaluate the performance of the proposed method. Finally, the NA- and SEL-based methods are applied to real data.  相似文献   

17.
In recent years, Bayesian statistics methods in neuroscience have been showing important advances. In particular, detection of brain signals for studying the complexity of the brain is an active area of research. Functional magnetic resonance imagining (fMRI) is an important tool to determine which parts of the brain are activated by different types of physical behavior. According to recent results, there is evidence that the values of the connectivity brain signal parameters are close to zero and due to the nature of time series fMRI data with high-frequency behavior, Bayesian dynamic models for identifying sparsity are indeed far-reaching. We propose a multivariate Bayesian dynamic approach for model selection and shrinkage estimation of the connectivity parameters. We describe the coupling or lead-lag between any pair of regions by using mixture priors for the connectivity parameters and propose a new weakly informative default prior for the state variances. This framework produces one-step-ahead proper posterior predictive results and induces shrinkage and robustness suitable for fMRI data in the presence of sparsity. To explore the performance of the proposed methodology, we present simulation studies and an application to functional magnetic resonance imaging data.  相似文献   

18.
Spatially correlated data appear in many environmental studies, and consequently there is an increasing demand for estimation methods that take account of spatial correlation and thereby improve the accuracy of estimation. In this paper we propose an iterative nonparametric procedure for modelling spatial data with general correlation structures. The asymptotic normality of the proposed estimators is established under mild conditions. We demonstrate, using both simulation and case studies, that the proposed estimators are more efficient than the traditional locally linear methods which fail to account for spatial correlation.  相似文献   

19.
In survey sampling, policymaking regarding the allocation of resources to subgroups (called small areas) or the determination of subgroups with specific properties in a population should be based on reliable estimates. Information, however, is often collected at a different scale than that of these subgroups; hence, the estimation can only be obtained on finer scale data. Parametric mixed models are commonly used in small‐area estimation. The relationship between predictors and response, however, may not be linear in some real situations. Recently, small‐area estimation using a generalised linear mixed model (GLMM) with a penalised spline (P‐spline) regression model, for the fixed part of the model, has been proposed to analyse cross‐sectional responses, both normal and non‐normal. However, there are many situations in which the responses in small areas are serially dependent over time. Such a situation is exemplified by a data set on the annual number of visits to physicians by patients seeking treatment for asthma, in different areas of Manitoba, Canada. In cases where covariates that can possibly predict physician visits by asthma patients (e.g. age and genetic and environmental factors) may not have a linear relationship with the response, new models for analysing such data sets are required. In the current work, using both time‐series and cross‐sectional data methods, we propose P‐spline regression models for small‐area estimation under GLMMs. Our proposed model covers both normal and non‐normal responses. In particular, the empirical best predictors of small‐area parameters and their corresponding prediction intervals are studied with the maximum likelihood estimation approach being used to estimate the model parameters. The performance of the proposed approach is evaluated using some simulations and also by analysing two real data sets (precipitation and asthma).  相似文献   

20.
A criterion for robust estimation of location and covariance matrix is considered, and its application in outlier labeling is discussed. This method, unlike the methods based on MVE and MCD, is applicable to large and high-dimension data sets. The method proposed here is also robust and has the same breakdown point as the MVE- and MCD-based methods. Furthermore, the computational complexity of the proposed method is significantly smaller than that of other methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号