首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Among the most well known estimators of multivariate location and scatter is the Minimum Volume Ellipsoid (MVE). Many algorithms have been proposed to compute it. Most of these attempt merely to approximate as close as possible the exact MVE, but some of them led to the definition of new estimators which maintain the properties of robustness and affine equivariance that make the MVE so attractive. Rousseeuw and van Zomeren (1990) used the <$>(p+1)<$>- subset estimator which was modified by Croux and Haesbroeck (1997) to give rise to the averaged <$>(p+1)<$>- subset estimator . This note shows by means of simulations that the averaged <$>(p+1)<$>-subset estimator outperforms the exact estimator as far as finite-sample efficiency is concerned. We also present a new robust estimator for the MVE, closely related to the averaged <$>(p+1)<$>-subset estimator, but yielding a natural ranking of the data.  相似文献   

2.
In linear regression, outliers and leverage points often have large influence in the model selection process. Such cases are downweighted with Mallows-type weights here, during estimation of submodel parameters by generalised M-estimation. A robust version of Mallows's Cp (Ronchetti &. Staudte, 1994) is then used to select a variety of submodels which are as informative as the full model. The methodology is illustrated on a new dataset concerning the agglomeration of alumina in Bayer precipitation.  相似文献   

3.

Asymptotic confidence (delta) intervals and intervals based upon the use of Fieller's theorem are alternative methods for constructing intervals for the <$>\gamma<$>% effective doses (ED<$>_\gamma<$>). Sitter and Wu (1993) provided a comparison of the two approaches for the ED<$>_{50}<$>, for the case in which a logistic dose response curve is assumed. They showed that the Fieller intervals are generally superior. In this paper, we introduce two new families of intervals, both of which include the delta and Fieller intervals as special cases. In addition we consider interval estimation of the ED<$>_{90}<$> as well as the ED<$>_{50}<$>. We provide a comparison of the various methods for the problem of constructing a confidence interval for the ED<$>_\gamma<$>.  相似文献   

4.
X. Guyon  C. Hardouin 《Statistics》2013,47(4):339-363
This study deals with time dynamics of Markov fields defined on a finite set of sites with state space <$>E<$>, focussing on Markov Chain Markov Field (MCMF) evolution. Such a model is characterized by two families of potentials: the instantaneous interaction potentials, and the time delay potentials. Four models are specified: auto-exponential dynamics (<$>E = {\of R}^+<$>), auto-normal dynamics (<$>E = {\of R}<$>), auto-Poissonian dynamics (<$>E = {\of N}<$>) and auto-logistic dynamics ( E qualitative and finite). Sufficient conditions ensuring ergodicity and strong law of large numbers are given by using a Lyapunov criterion of stability, and the conditional pseudo-likelihood statistics are summarized. We discuss the identification procedure of the two Markovian graphs and look for validation tests using martingale central limit theorems. An application to meteorological data illustrates such a modelling.  相似文献   

5.
6.
7.
Regression analysis aims to estimate the approximate relationship between the response variable and the explanatory variables. This can be done using classical methods such as ordinary least squares. Unfortunately, these methods are very sensitive to anomalous points, often called outliers, in the data set. The main contribution of this article is to propose a new version of the Generalized M-estimator that provides good resistance against vertical outliers and bad leverage points. The advantage of this method over the existing methods is that it does not minimize the weight of the good leverage points, and this increases the efficiency of this estimator. To achieve this goal, the fixed parameters support vector regression technique is used to identify and minimize the weight of outliers and bad leverage points. The effectiveness of the proposed estimator is investigated using real and simulated data sets.  相似文献   

8.
Summary. Many geophysical regression problems require the analysis of large (more than 104 values) data sets, and, because the data may represent mixtures of concurrent natural processes with widely varying statistical properties, contamination of both response and predictor variables is common. Existing bounded influence or high breakdown point estimators frequently lack the ability to eliminate extremely influential data and/or the computational efficiency to handle large data sets. A new bounded influence estimator is proposed that combines high asymptotic efficiency for normal data, high breakdown point behaviour with contaminated data and computational simplicity for large data sets. The algorithm combines a standard M -estimator to downweight data corresponding to extreme regression residuals and removal of overly influential predictor values (leverage points) on the basis of the statistics of the hat matrix diagonal elements. For this, the exact distribution of the hat matrix diagonal elements p ii for complex multivariate Gaussian predictor data is shown to be β ( p ii ,  m ,  N − m ), where N is the number of data and m is the number of parameters. Real geophysical data from an auroral zone magnetotelluric study which exhibit severe outlier and leverage point contamination are used to illustrate the estimator's performance. The examples also demonstrate the utility of looking at both the residual and the hat matrix distributions through quantile–quantile plots to diagnose robust regression problems.  相似文献   

9.
The penalized maximum likelihood estimator (PMLE) has been widely used for variable selection in high-dimensional data. Various penalty functions have been employed for this purpose, e.g., Lasso, weighted Lasso, or smoothly clipped absolute deviations. However, the PMLE can be very sensitive to outliers in the data, especially to outliers in the covariates (leverage points). In order to overcome this disadvantage, the usage of the penalized maximum trimmed likelihood estimator (PMTLE) is proposed to estimate the unknown parameters in a robust way. The computation of the PMTLE takes advantage of the same technology as used for PMLE but here the estimation is based on subsamples only. The breakdown point properties of the PMTLE are discussed using the notion of $d$ -fullness. The performance of the proposed estimator is evaluated in a simulation study for the classical multiple linear and Poisson linear regression models.  相似文献   

10.
Visuri et al. (2000 Visuri, S., Koivunen, V., Oja, H. (2000). Sign and rank covariance matrices. J. Stat. Plann. Inference 91:557575.[Crossref], [Web of Science ®] [Google Scholar]) proposed a technique for robust covariance matrix estimation based on different notions of multivariate sign and rank. Among them, the spatial rank based covariance matrix estimator that utilizes a robust scale estimator is especially appealing due to its high robustness, computational ease, and good efficiency. Also, it is orthogonally equivariant under any distribution and affinely equivariant under elliptically symmetric distributions. In this paper, we study robustness properties of the estimator with respective to two measures: breakdown point and influence function. More specifically, the upper bound of the finite sample breakdown point can be achieved by a proper choice of univariate robust scale estimator. The influence functions for eigenvalues and eigenvectors of the estimator are derived. They are found to be bounded under some assumptions. Moreover, finite sample efficiency comparisons to popular robust MCD, M, and S estimators are reported.  相似文献   

11.
A class of trimmed linear conditional estimators based on regression quantiles for the linear regression model is introduced. This class serves as a robust analogue of non-robust linear unbiased estimators. Asymptotic analysis then shows that the trimmed least squares estimator based on regression quantiles ( Koenker and Bassett ( 1978 ) ) is the best in this estimator class in terms of asymptotic covariance matrices. The class of trimmed linear conditional estimators contains the Mallows-type bounded influence trimmed means ( see De Jongh et al ( 1988 ) ) and trimmed instrumental variables estimators. A large sample methodology based on trimmed instrumental variables estimator for confidence ellipsoids and hypothesis testing is also provided.  相似文献   

12.
We obtain the possible limit distributions of unbiased estimators of functions of the parameter of a natural exponential family. The limit distribution depends on <$>j<$>, the order of the first non-zero derivative at the true (but usually unknown) value of the parameter. We show that if <$>j \geq 2<$> then the umvu and the maximum likelihood estimators are not asymptotically equivalent.  相似文献   

13.
To perform regression analysis in high dimensions, lasso or ridge estimation are a common choice. However, it has been shown that these methods are not robust to outliers. Therefore, alternatives as penalized M-estimation or the sparse least trimmed squares (LTS) estimator have been proposed. The robustness of these regression methods can be measured with the influence function. It quantifies the effect of infinitesimal perturbations in the data. Furthermore, it can be used to compute the asymptotic variance and the mean-squared error (MSE). In this paper we compute the influence function, the asymptotic variance and the MSE for penalized M-estimators and the sparse LTS estimator. The asymptotic biasedness of the estimators make the calculations non-standard. We show that only M-estimators with a loss function with a bounded derivative are robust against regression outliers. In particular, the lasso has an unbounded influence function.  相似文献   

14.
In this paper we consider weighted generalized‐signed‐rank estimators of nonlinear regression coefficients. The generalization allows us to include popular estimators such as the least squares and least absolute deviations estimators but by itself does not give bounded influence estimators. Adding weights results in estimators with bounded influence function. We establish conditions needed for the consistency and asymptotic normality of the proposed estimator and discuss how weight functions can be chosen to achieve bounded influence function of the estimator. Real life examples and Monte Carlo simulation experiments demonstrate the robustness and efficiency of the proposed estimator. An example shows that the weighted signed‐rank estimator can be useful to detect outliers in nonlinear regression. The Canadian Journal of Statistics 40: 172–189; 2012 © 2012 Statistical Society of Canada  相似文献   

15.
Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects.  相似文献   

16.
Selection of relevant predictor variables for building a model is an important problem in the multiple linear regression. Variable selection method based on ordinary least squares estimator fails to select the set of relevant variables for building a model in the presence of outliers and leverage points. In this article, we propose a new robust variable selection criterion for selection of relevant variables in the model and establish its consistency property. Performance of the proposed method is evaluated through simulation study and real data.  相似文献   

17.
We propose a strongly root-n consistent simulation-based estimator for the generalized linear mixed models. This estimator is constructed based on the first two marginal moments of the response variables, and it allows the random effects to have any parametric distribution (not necessarily normal). Consistency and asymptotic normality for the proposed estimator are derived under fairly general regularity conditions. We also demonstrate that this estimator has a bounded influence function and that it is robust against data outliers. A bias correction technique is proposed to reduce the finite sample bias in the estimation of variance components. The methodology is illustrated through an application to the famed seizure count data and some simulation studies.  相似文献   

18.
ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers.  相似文献   

19.
This paper focuses on the inference of the normal mixture model with unequal variances. A feature of the model is flexibility of density shape, but its flexibility causes the unboundedness of the likelihood function and excessive sensitivity of the maximum likelihood estimator to outliers. A modified likelihood approach suggested in Basu et al. [1998, Biometrika 85, 549–559] can overcome these drawbacks. It is shown that the modified likelihood function is bounded above under a mild condition on mixing proportions and the resultant estimator is robust to outliers. A relationship between robustness and efficiency is investigated and an adaptive method for selecting the tuning parameter of the modified likelihood is suggested, based on the robust model selection criterion and the cross-validation. An EM-like algorithm is also constructed. Numerical studies are presented to evaluate the performance. The robust method is applied to single nuleotide polymorphism typing for the purpose of outlier detection and clustering.  相似文献   

20.
Maximum likelihood approach is the most frequently employed approach for the inference of linear mixed models. However, it relies on the normal distributional assumption of the random effects and the within-subject errors, and it is lack of robustness against outliers. This article proposes a semiparametric estimation approach for linear mixed models. This approach is based on the first two marginal moments of the response variable, and does not require any parametric distributional assumptions of random effects or error terms. The consistency and asymptotically normality of the estimator are derived under fairly general conditions. In addition, we show that the proposed estimator has a bounded influence function and a redescending property so it is robust to outliers. The methodology is illustrated through an application to the famed Framingham cholesterol data. The finite sample behavior and the robustness properties of the proposed estimator are evaluated through extensive simulation studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号