首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
The resistance of least absolute values (L1) estimators to outliers and their robustness to heavy-tailed distributions make these estimators useful alternatives to the usual least squares estimators. The recent development of efficient algorithms for L1 estimation in linear models has permitted their use in practical data analysis. Although in general the L1 estimators are not unique, there are a number of properties they all share. The set of all L1 estimators for a given model and data set can be characterized as the convex hull of some extreme estimators. Properties of the extreme estimators and of the L1-estimate set are considered.  相似文献   

2.
ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers.  相似文献   

3.
A number of efficient computer codes are available for the simple linear L 1 regression problem. However, a number of these codes can be made more efficient by utilizing the least squares solution. In fact, a couple of available computer programs already do so.

We report the results of a computational study comparing several openly available computer programs for solving the simple linear L 1 regression problem with and without computing and utilizing a least squares solution.  相似文献   

4.
Consider the linear regression model y =β01 ++ in the usual notation. It is argued that the class of ordinary ridge estimators obtained by shrinking the least squares estimator by the matrix (X1X + kI)-1X'X is sensitive to outliers in the ^variable. To overcome this problem, we propose a new class of ridge-type M-estimators, obtained by shrinking an M-estimator (instead of the least squares estimator) by the same matrix. Since the optimal value of the ridge parameter k is unknown, we suggest a procedure for choosing it adaptively. In a reasonably large scale simulation study with a particular M-estimator, we found that if the conditions are such that the M-estimator is more efficient than the least squares estimator then the corresponding ridge-type M-estimator proposed here is better, in terms of a Mean Squared Error criteria, than the ordinary ridge estimator with k chosen suitably. An example illustrates that the estimators proposed here are less sensitive to outliers in the y-variable than ordinary ridge estimators.  相似文献   

5.
Regression analysis aims to estimate the approximate relationship between the response variable and the explanatory variables. This can be done using classical methods such as ordinary least squares. Unfortunately, these methods are very sensitive to anomalous points, often called outliers, in the data set. The main contribution of this article is to propose a new version of the Generalized M-estimator that provides good resistance against vertical outliers and bad leverage points. The advantage of this method over the existing methods is that it does not minimize the weight of the good leverage points, and this increases the efficiency of this estimator. To achieve this goal, the fixed parameters support vector regression technique is used to identify and minimize the weight of outliers and bad leverage points. The effectiveness of the proposed estimator is investigated using real and simulated data sets.  相似文献   

6.
To perform regression analysis in high dimensions, lasso or ridge estimation are a common choice. However, it has been shown that these methods are not robust to outliers. Therefore, alternatives as penalized M-estimation or the sparse least trimmed squares (LTS) estimator have been proposed. The robustness of these regression methods can be measured with the influence function. It quantifies the effect of infinitesimal perturbations in the data. Furthermore, it can be used to compute the asymptotic variance and the mean-squared error (MSE). In this paper we compute the influence function, the asymptotic variance and the MSE for penalized M-estimators and the sparse LTS estimator. The asymptotic biasedness of the estimators make the calculations non-standard. We show that only M-estimators with a loss function with a bounded derivative are robust against regression outliers. In particular, the lasso has an unbounded influence function.  相似文献   

7.
The L1-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L1-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L1-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers.  相似文献   

8.
Estimation of a regression function from independent and identical distributed data is considered. The L2 error with integration with respect to the design measure is used as error criterion. Upper bounds on the L2 error of least squares regression estimates are presented, which bound the error of the estimate in case that in the sample given to the estimate the values of the independent and the dependent variables are pertubated by some arbitrary procedure. The bounds are applied to analyze regression-based Monte Carlo methods for pricing American options in case of errors in modelling the price process.  相似文献   

9.
It is well known that when the true values of the independent variable are unobservable due to measurement error, the least squares estimator for a regression model is biased and inconsistent. When repeated observations on each xi are taken, consistent estimators for the linear-plateau model can be formed. The repeated observations are required to classify each observation to the appropriate line segment. Two cases of repeated observations are treated in detail. First, when a single value of yi is observed with the repeated observations of xi the least squares estimator using the mean of the repeated xi observations is consistent and asymptotically normal. Second, when repeated observations on the pair (xi, yi ) are taken the least squares estimator is inconsistent, but two consistent estimators are proposed: one that consistently estimates the bias of the least squares estimator and adjusts accordingly; the second is the least squares estimator using the mean of the repeated observations on each pair.  相似文献   

10.
Robust regression has not had a great impact on statistical practice, although all statisticians are convinced of its importance. The procedures for robust regression currently available are complex, and computer intensive. With a modification of the Gaussian paradigm, taking into consideration outliers and leverage points, we propose an iteratively weighted least squares method which gives robust fits. The procedure is illustrated by applying it on data sets which have been previously used to illustrate robust regression methods.It is hoped that this simple, effective and accessible method will find its use in statistical practice.  相似文献   

11.
Leverage values are being used in regression diagnostics as measures of unusual observations in the X-space. Detection of high leverage observations or points is crucial due to their responsibility for masking outliers. In linear regression, high leverage points (HLP) are those that stand far apart from the center (mean) of the data and hence the most extreme points in the covariate space get the highest leverage. But Hosemer and Lemeshow [Applied logistic regression, Wiley, New York, 1980] pointed out that in logistic regression, the leverage measure contains a component which can make the leverage values of genuine HLP misleadingly very small and that creates problem in the correct identification of the cases. Attempts have been made to identify the HLP based on the median distances from the mean, but since they are designed for the identification of a single high leverage point they may not be very effective in the presence of multiple HLP due to their masking (false–negative) and swamping (false–positive) effects. In this paper we propose a new method for the identification of multiple HLP in logistic regression where the suspect cases are identified by a robust group deletion technique and they are confirmed using diagnostic techniques. The usefulness of the proposed method is then investigated through several well-known examples and a Monte Carlo simulation.  相似文献   

12.
The varying coefficient model (VCM) is an important generalization of the linear regression model and many existing estimation procedures for VCM were built on L 2 loss, which is popular for its mathematical beauty but is not robust to non-normal errors and outliers. In this paper, we address the problem of both robustness and efficiency of estimation and variable selection procedure based on the convex combined loss of L 1 and L 2 instead of only quadratic loss for VCM. By using local linear modeling method, the asymptotic normality of estimation is driven and a useful selection method is proposed for the weight of composite L 1 and L 2. Then the variable selection procedure is given by combining local kernel smoothing with adaptive group LASSO. With appropriate selection of tuning parameters by Bayesian information criterion (BIC) the theoretical properties of the new procedure, including consistency in variable selection and the oracle property in estimation, are established. The finite sample performance of the new method is investigated through simulation studies and the analysis of body fat data. Numerical studies show that the new method is better than or at least as well as the least square-based method in terms of both robustness and efficiency for variable selection.  相似文献   

13.
In this paper we seek designs and estimators which are optimal in some sense for multivariate linear regression on cubes and simplexes when the true regression function is unknown. More precisely, we assume that the unknown true regression function is the sum of a linear part plus some contamination orthogonal to the set of all linear functions in the L2 norm with respect to Lebesgue measure. The contamination is assumed bounded in absolute value and it is shown that the usual designs for multivariate linear regression on cubes and simplices and the usual least squares estimators minimize the supremum over all possible contaminations of the expected mean square error. Additional results for extrapolation and interpolation, among other things, are discussed. For suitable loss functions optimal designs are found to have support on the extreme points of our design space.  相似文献   

14.
Statistics that usually accompany the regression model do not provide insight into the quality of the data or the potential influence of the individual observations on the estimates. In this study, the Q2 statistic is used as a criterion for detecting influential observations or outliers. The statistic is derived from the jackknifed residuals, the squared sum of which is generally known as the prediction sum of squares or PRESS. This article compares R 2 with Q2 and suggests that the latter be used as part of the data-quality check. It is shown, for two separate data sets obtained from regional cost of living and U.S. food industry studies, that in the presence of outliers the Q2 statistic can be negative, because it is sensitive to the choice of regressors and the inclusion of influential observations. Once the outliers are dropped from the sample, the discrepancy between Q2 and R 2 values is negligible.  相似文献   

15.
In this paper, we suggest a least squares procedure for the determination of the number of upper outliers in an exponential sample by minimizing sample mean squared error. Moreover, the method can reduce the masking or “swamping” effects. In addition, we have also found that the least squares procedure is easy and simple to compute than test test procedure T k suggested by Zhang (1998) for determining the number of upper outliers, since Zhang (1998) need to use the complicated null distribution of T k . Moreover, we give three practical examples and a simulated example to illustrate the procedures. Further, simulation studies are given to show the advantages of the proposed method. Finally, the proposed least squares procedure can also determine the number of upper outliers in other continuous univariate distributions (for example, Pareto, Gumbel, Weibull, etc.). Received: May 10, 1999; revised version: June 5, 2000  相似文献   

16.
Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects.  相似文献   

17.
There is currently much discussion about lasso-type regularized regression which is a useful tool for simultaneous estimation and variable selection. Although the lasso-type regularization has several advantages in regression modelling, owing to its sparsity, it suffers from outliers because of using penalized least-squares methods. To overcome this issue, we propose a robust lasso-type estimation procedure that uses the robust criteria as the loss function, imposing L1-type penalty called the elastic net. We also introduce to use the efficient bootstrap information criteria for choosing optimal regularization parameters and a constant in outlier detection. Simulation studies and real data analysis are given to examine the efficiency of the proposed robust sparse regression modelling. We observe that our modelling strategy performs well in the presence of outliers.  相似文献   

18.
The bootstrap is a methodology for estimating standard errors. The idea is to use a Monte Carlo simulation experiment based on a nonparametric estimate of the error distribution. The main objective of this article is to demonstrate the use of the bootstrap to attach standard errors to coefficient estimates in a second-order autoregressive model fitted by least squares and maximum likelihood estimation. Additionally, a comparison of the bootstrap and the conventional methodology is made. As it turns out, the conventional asymptotic formulae (both the least squares and maximum likelihood estimates) for estimating standard errors appear to overestimate the true standard errors. But there are two problems:i. The first two observations y1 and y2 have been fixed, and ii. The residuals have not been inflated. After these two factors are considered in the trial and bootstrap experiment, both the conventional maximum likelihood and bootstrap estimates of the standard errors appear to be performing quite well.  相似文献   

19.
When the data contain outliers or come from population with heavy-tailed distributions, which appear very often in spatiotemporal data, the estimation methods based on least-squares (L2) method will not perform well. More robust estimation methods are required. In this article, we propose the local linear estimation for spatiotemporal models based on least absolute deviation (L1) and drive the asymptotic distributions of the L1-estimators under some mild conditions imposed on the spatiotemporal process. The simulation results for two examples, with outliers and heavy-tailed distribution, respectively, show that the L1-estimators perform better than the L2-estimators.  相似文献   

20.
The question of nonuniqueness of the least absolute values (L1) regression is discussed, and examples are given of situations where the L1-regression is unique and where it has 2, 3 and 4 limiting positions. A method is proposed for finding a compromise regression line when the L1 regression is not unique. Suggestions are made for further research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号