首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The BCH procedure introduced by Billor, Chatterjee, and Hadi for fitting linear models was found to be inefficient for y-outliers in the presence of a high perturbation level. We propose to modify the first step of the BCH procedure, so that the robust distances are computed on the matrix Z = (y, X) of the basic subset. The performance of the present note procedure (PNP), as compared to the BCH procedure and the ordinary least-square (OLS) method, was studied by processing several datasets used in the literature for robust regression and by performing a Monte Carlo experiment. PNP performs better particularly with datasets having high perturbation.  相似文献   

2.
In this paper the most commonly used diagnostic criteria for the identification of outliers or leverage points in the ordinary regression model are reviewed. Their use in the context of the errors-in-variables (e.v.) linear model is discussed and evidence is given that under the e.v. model assumptions the distinction between outliers and leverage points no longer exists.  相似文献   

3.
4.
    
Robust regression has not had a great impact on statistical practice, although all statisticians are convinced of its importance. The procedures for robust regression currently available are complex, and computer intensive. With a modification of the Gaussian paradigm, taking into consideration outliers and leverage points, we propose an iteratively weighted least squares method which gives robust fits. The procedure is illustrated by applying it on data sets which have been previously used to illustrate robust regression methods.It is hoped that this simple, effective and accessible method will find its use in statistical practice.  相似文献   

5.
Both the least squares estimator and M-estimators of regression coefficients are susceptible to distortion when high leverage points occur among the predictor variables in a multiple linear regression model. In this article a weighting scheme which enables one to bound the leverage values of a weighted matrix of predictor variables is proposed. Bounded-leverage weighting of the predictor variables followed by M-estimation of the regression coefficients is shown to be effective in protecting against distortion due to extreme predictor-variable values, extreme response values, or outlier-induced multieollinearites. Bounded-leverage estimators can also protect against distortion by small groups of high leverage points.  相似文献   

6.
    
Support Vector Regression (SVR) is gaining in popularity in the detection of outliers and classification problems in high-dimensional data (HDD) as this technique does not require the data to be of full rank. In real application, most of the data are of high dimensional. Classification of high-dimensional data is needed in applied sciences, in particular, as it is important to discriminate cancerous cells from non-cancerous cells. It is also imperative that outliers are identified before constructing a model on the relationship between the dependent and independent variables to avoid misleading interpretations about the fitting of a model. The standard SVR and the μ-ε-SVR are able to detect outliers; however, they are computationally expensive. The fixed parameters support vector regression (FP-ε-SVR) was put forward to remedy this issue. However, the FP-ε-SVR using ε-SVR is not very successful in identifying outliers. In this article, we propose an alternative method to detect outliers i.e. by employing nu-SVR. The merit of our proposed method is confirmed by three real examples and the Monte Carlo simulation. The results show that our proposed nu-SVR method is very successful in identifying outliers under a variety of situations, and with less computational running time.  相似文献   

7.
ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers.  相似文献   

8.
Least trimmed squares (LTS) provides a parametric family of high breakdown estimators in regression with better asymptotic properties than least median of squares (LMS) estimators. We adapt the forward search algorithm of Atkinson (1994) to LTS and provide methods for determining the amount of data to be trimmed. We examine the efficiency of different trimming proportions by simulation and demonstrate the increasing efficiency of parameter estimation as larger proportions of data are fitted using the LTS criterion. Some standard data examples are analysed. One shows that LTS provides more stable solutions than LMS.  相似文献   

9.
Geometric mean (GM) is having growing and wider applications in statistical data analysis as a measure of central tendency. It is generally believed that GM is less sensitive to outliers than the arithmetic mean (AM) but we suspect likewise the AM the GM may also suffer a huge set back in the presence of outliers, especially when multiple outliers occur in a data. So far as we know, not much work has been done on the robustness issue of GM. In quest of a simple robust measure of central tendency, we propose the geometric median (GMed) in this paper. We show that the classical GM has only 0% breakdown point while it is 50% for the proposed GMed. Numerical examples also support our claim that the proposed GMed is unaffected in the presence of multiple outliers and can maintain the highest possible 50% breakdown. Later we develop a new method for the identification of multiple outliers based on this proposed GMed. A variety of numerical examples show that the proposed method can successfully identify all potential outliers while the traditional GM fails to do so.  相似文献   

10.
    
Trimming principles play an important role in robust statistics. However, their use for clustering typically requires some preliminary information about the contamination rate and the number of groups. We suggest a fresh approach to trimming that does not rely on this knowledge and that proves to be particularly suited for solving problems in robust cluster analysis. Our approach replaces the original K‐population (robust) estimation problem with K distinct one‐population steps, which take advantage of the good breakdown properties of trimmed estimators when the trimming level exceeds the usual bound of 0.5. In this setting, we prove that exact affine equivariance is lost on one hand but, on the other hand, an arbitrarily high breakdown point can be achieved by “anchoring” the robust estimator. We also support the use of adaptive trimming schemes, in order to infer the contamination rate from the data. A further bonus of our methodology is its ability to provide a reliable choice of the usually unknown number of groups.  相似文献   

11.
    
Andreas Artemiou 《Statistics》2013,47(5):1037-1051
In this paper, we combine adaptively weighted large margin classifiers with Support Vector Machine (SVM)-based dimension reduction methods to create dimension reduction methods robust to the presence of extreme outliers. We discuss estimation and asymptotic properties of the algorithm. The good performance of the new algorithm is demonstrated through simulations and real data analysis.  相似文献   

12.
13.
A typical added variable plot is a commonly used plot in assessing the accuracy of a normal linear model. This plot is often used to evaluate the effect of adding an explanatory variable into the model and to detect possibly high leverage points or influential observations on the added variable. However, this type of plot is generally in doubt, once the normal distributional assumptions are violated. In this article, we extend the robust likelihood technique introduced by Royall and Tsou [11] to propose a robust added variable plot. The validity of this diagnostic plot requires no knowledge of the true underlying distributions so long as their second moments exist. The usefulness of the robust graphical approach is demonstrated through a few illustrations and simulations.  相似文献   

14.
    
Single-case deletion regression diagnostics have been used widely to discover unusual data points, but such approaches can fail in the presence of multiple unusual data points and as a result of masking. We propose a new approach to the use of single-case deletion diagnostics that involves applying these diagnostics to delete-2 and delete-3 jackknife replicates of the data, and considering the percentage of times among these replicates that points are flagged as unusual as an indicator of their influence. By considering replicates that exclude certain collections of points, subtle masking effects can be uncovered.  相似文献   

15.
16.
A simple analytical expression is derived for leverage in ridge regression. Leverage is shown to be a monotonically decreasing function of the value of the ridge parameter. This reduction in leverage is greatest for those observations lying substantially in the direction of the minor principal axes. Thus, ridge estimation copes with outliers in regressor space by downweighting their influence. A brief illustration is provided.  相似文献   

17.
Summary.  Editing in surveys of economic populations is often complicated by the fact that outliers due to errors in the data are mixed in with correct, but extreme, data values. We describe and evaluate two automatic techniques for the identification of errors in such long-tailed data distributions. The first is a forward search procedure based on finding a sequence of error-free subsets of the error-contaminated data and then using regression modelling within these subsets to identify errors. The second uses a robust regression tree modelling procedure to identify errors. Both approaches can be implemented on a univariate basis or on a multivariate basis. An application to a business survey data set that contains a mix of extreme errors and true outliers is described.  相似文献   

18.
Quantile regression (QR) models have received a great deal of attention in both the theoretical and applied statistical literature. In this paper we propose support vector quantile regression (SVQR) with monotonicity restriction, which is easily obtained via the dual formulation of the optimization problem. We also provide the generalized approximate cross validation method for choosing the hyperparameters which affect the performance of the proposed SVQR. The experimental results for the synthetic and real data sets confirm the successful performance of the proposed model.  相似文献   

19.
This paper deals with a formal identification of outliers in regression based on tests of hypotheses. The hypothesis is not the standard one but is based on performance criteria that relates to the coefficient estimation and predictive capabilities of the model. The cri-teria include the trace of the mean square error matrix on the coefficients and integrated mean square error of prediction. Both the mean shift outlier model and the variance in-flation model are discussed.  相似文献   

20.
The commonly made assumption that all stochastic error terms in the linear regression model share the same variance (homoskedasticity) is oftentimes violated in practical applications, especially when they are based on cross-sectional data. As a precaution, a number of practitioners choose to base inference on the parameters that index the model on tests whose statistics employ asymptotically correct standard errors, i.e. standard errors that are asymptotically valid whether or not the errors are homoskedastic. In this paper, we use numerical integration methods to evaluate the finite-sample performance of tests based on different (alternative) heteroskedasticity-consistent standard errors. Emphasis is placed on a few recently proposed heteroskedasticity-consistent covariance matrix estimators. Overall, the results favor the HC4 and HC5 heteroskedasticity-robust standard errors. We also consider the use of restricted residuals when constructing asymptotically valid standard errors. Our results show that the only test that clearly benefits from such a strategy is the HC0 test.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号