期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

J. E. Gentle W. J. Kennedy V. A. Sposito 《统计学通讯:理论与方法》2013,42(9):839-845

The resistance of least absolute values (L₁) estimators to outliers and their robustness to heavy-tailed distributions make these estimators useful alternatives to the usual least squares estimators. The recent development of efficient algorithms for L₁ estimation in linear models has permitted their use in practical data analysis. Although in general the L₁ estimators are not unique, there are a number of properties they all share. The set of all L₁ estimators for a given model and data set can be characterized as the convex hull of some extreme estimators. Properties of the extreme estimators and of the L₁-estimate set are considered. 相似文献

2.

AN IMPROVED COMPOUND ESTIMATOR FOR ROBUST REGRESSION

《统计学通讯:模拟与计算》2013,42(4):653-672

ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers. 相似文献

3.

Comparison of computer programs for simple linear L 1 regression

《Journal of Statistical Computation and Simulation》2012,82(1-2):63-68

A number of efficient computer codes are available for the simple linear L ₁ regression problem. However, a number of these codes can be made more efficient by utilizing the least squares solution. In fact, a couple of available computer programs already do so.

We report the results of a computational study comparing several openly available computer programs for solving the simple linear L ₁ regression problem with and without computing and utilizing a least squares solution. 相似文献

4.

ROBUST RIDGE REGRESSION BASED ON AN M-ESTIMATOR

MERVYN J. SILVAPULLE 《Australian & New Zealand Journal of Statistics》1991,33(3):319-333

Consider the linear regression model y =β₀1 +Xβ+ in the usual notation. It is argued that the class of ordinary ridge estimators obtained by shrinking the least squares estimator by the matrix (X¹X + kI)^-1X'X is sensitive to outliers in the ^variable. To overcome this problem, we propose a new class of ridge-type M-estimators, obtained by shrinking an M-estimator (instead of the least squares estimator) by the same matrix. Since the optimal value of the ridge parameter k is unknown, we suggest a procedure for choosing it adaptively. In a reasonably large scale simulation study with a particular M-estimator, we found that if the conditions are such that the M-estimator is more efficient than the least squares estimator then the corresponding ridge-type M-estimator proposed here is better, in terms of a Mean Squared Error criteria, than the ordinary ridge estimator with k chosen suitably. An example illustrates that the estimators proposed here are less sensitive to outliers in the y-variable than ordinary ridge estimators. 相似文献

5.

A high breakdown,high efficiency and bounded influence modified GM estimator based on support vector regression

Waleed Dhhan Sohel Rana Habshah Midi 《Journal of applied statistics》2017,44(4):700-714

Regression analysis aims to estimate the approximate relationship between the response variable and the explanatory variables. This can be done using classical methods such as ordinary least squares. Unfortunately, these methods are very sensitive to anomalous points, often called outliers, in the data set. The main contribution of this article is to propose a new version of the Generalized M-estimator that provides good resistance against vertical outliers and bad leverage points. The advantage of this method over the existing methods is that it does not minimize the weight of the good leverage points, and this increases the efficiency of this estimator. To achieve this goal, the fixed parameters support vector regression technique is used to identify and minimize the weight of outliers and bad leverage points. The effectiveness of the proposed estimator is investigated using real and simulated data sets. 相似文献

6.

The influence function of penalized regression estimators

Viktoria Öllerer Christophe Croux Andreas Alfons 《Statistics》2015,49(4):741-765

To perform regression analysis in high dimensions, lasso or ridge estimation are a common choice. However, it has been shown that these methods are not robust to outliers. Therefore, alternatives as penalized M-estimation or the sparse least trimmed squares (LTS) estimator have been proposed. The robustness of these regression methods can be measured with the influence function. It quantifies the effect of infinitesimal perturbations in the data. Furthermore, it can be used to compute the asymptotic variance and the mean-squared error (MSE). In this paper we compute the influence function, the asymptotic variance and the MSE for penalized M-estimators and the sparse LTS estimator. The asymptotic biasedness of the estimators make the calculations non-standard. We show that only M-estimators with a loss function with a bounded derivative are robust against regression outliers. In particular, the lasso has an unbounded influence function. 相似文献

7.

Robust Coordinate Descent Algorithm Robust Solution Path for High-dimensional Sparse Regression Modeling

H. Park S. Konishi 《统计学通讯:模拟与计算》2016,45(1):115-129

The L₁-type regularization provides a useful tool for variable selection in high-dimensional regression modeling. Various algorithms have been proposed to solve optimization problems for L₁-type regularization. Especially the coordinate descent algorithm has been shown to be effective in sparse regression modeling. Although the algorithm shows a remarkable performance to solve optimization problems for L₁-type regularization, it suffers from outliers, since the procedure is based on the inner product of predictor variables and partial residuals obtained from a non-robust manner. To overcome this drawback, we propose a robust coordinate descent algorithm, especially focusing on the high-dimensional regression modeling based on the principal components space. We show that the proposed robust algorithm converges to the minimum value of its objective function. Monte Carlo experiments and real data analysis are conducted to examine the efficiency of the proposed robust algorithm. We observe that our robust coordinate descent algorithm effectively performs for the high-dimensional regression modeling even in the presence of outliers. 相似文献

8.

Analysis of least squares regression estimates in case of additional errors in the variables

Andreas Fromkorth Michael Kohler 《Journal of statistical planning and inference》2011,141(1):172-188

Estimation of a regression function from independent and identical distributed data is considered. The L₂ error with integration with respect to the design measure is used as error criterion. Upper bounds on the L₂ error of least squares regression estimates are presented, which bound the error of the estimate in case that in the sample given to the estimate the values of the independent and the dependent variables are pertubated by some arbitrary procedure. The bounds are applied to analyze regression-based Monte Carlo methods for pricing American options in case of errors in modelling the price process. 相似文献

9.

Estimation of the linear-plateau segmented regression model in the presence of measurement error

Scott D. Grimshaw 《统计学通讯:理论与方法》2013,42(8):2399-2413

It is well known that when the true values of the independent variable are unobservable due to measurement error, the least squares estimator for a regression model is biased and inconsistent. When repeated observations on each x_i are taken, consistent estimators for the linear-plateau model can be formed. The repeated observations are required to classify each observation to the appropriate line segment. Two cases of repeated observations are treated in detail. First, when a single value of y_i is observed with the repeated observations of x_i the least squares estimator using the mean of the repeated x_i observations is consistent and asymptotically normal. Second, when repeated observations on the pair (x_i, y_i ) are taken the least squares estimator is inconsistent, but two consistent estimators are proposed: one that consistently estimates the bias of the least squares estimator and adjusts accordingly; the second is the least squares estimator using the mean of the repeated observations on each pair. 相似文献

10.

Robust regression:a weighted least squares approach

samprit Chatterjee Martin Mächler 《统计学通讯:理论与方法》2013,42(6):1381-1394

Robust regression has not had a great impact on statistical practice, although all statisticians are convinced of its importance. The procedures for robust regression currently available are complex, and computer intensive. With a modification of the Gaussian paradigm, taking into consideration outliers and leverage points, we propose an iteratively weighted least squares method which gives robust fits. The procedure is illustrated by applying it on data sets which have been previously used to illustrate robust regression methods.It is hoped that this simple, effective and accessible method will find its use in statistical practice. 相似文献

11.

Identification of multiple high leverage points in logistic regression

A.H.M. Rahmatullah Imon Ali S. Hadi 《Journal of applied statistics》2013,40(12):2601-2616

Leverage values are being used in regression diagnostics as measures of unusual observations in the X-space. Detection of high leverage observations or points is crucial due to their responsibility for masking outliers. In linear regression, high leverage points (HLP) are those that stand far apart from the center (mean) of the data and hence the most extreme points in the covariate space get the highest leverage. But Hosemer and Lemeshow [Applied logistic regression, Wiley, New York, 1980] pointed out that in logistic regression, the leverage measure contains a component which can make the leverage values of genuine HLP misleadingly very small and that creates problem in the correct identification of the cases. Attempts have been made to identify the HLP based on the median distances from the mean, but since they are designed for the identification of a single high leverage point they may not be very effective in the presence of multiple HLP due to their masking (false–negative) and swamping (false–positive) effects. In this paper we propose a new method for the identification of multiple HLP in logistic regression where the suspect cases are identified by a robust group deletion technique and they are confirmed using diagnostic techniques. The usefulness of the proposed method is then investigated through several well-known examples and a Monte Carlo simulation. 相似文献

12.

Robust variable selection for the varying coefficient model based on composite L 1–L 2 regression

Weihua Zhao Jicai Liu 《Journal of applied statistics》2013,40(9):2024-2040

The varying coefficient model (VCM) is an important generalization of the linear regression model and many existing estimation procedures for VCM were built on L ₂ loss, which is popular for its mathematical beauty but is not robust to non-normal errors and outliers. In this paper, we address the problem of both robustness and efficiency of estimation and variable selection procedure based on the convex combined loss of L ₁ and L ₂ instead of only quadratic loss for VCM. By using local linear modeling method, the asymptotic normality of estimation is driven and a useful selection method is proposed for the weight of composite L ₁ and L ₂. Then the variable selection procedure is given by combining local kernel smoothing with adaptive group LASSO. With appropriate selection of tuning parameters by Bayesian information criterion (BIC) the theoretical properties of the new procedure, including consistency in variable selection and the oracle property in estimation, are established. The finite sample performance of the new method is investigated through simulation studies and the analysis of body fat data. Numerical studies show that the new method is better than or at least as well as the least square-based method in terms of both robustness and efficiency for variable selection. 相似文献

13.

Robust designs for nearly linear regression

K.C. Li W. Notz 《Journal of statistical planning and inference》1982,6(2):135-151

In this paper we seek designs and estimators which are optimal in some sense for multivariate linear regression on cubes and simplexes when the true regression function is unknown. More precisely, we assume that the unknown true regression function is the sum of a linear part plus some contamination orthogonal to the set of all linear functions in the L₂ norm with respect to Lebesgue measure. The contamination is assumed bounded in absolute value and it is shown that the usual designs for multivariate linear regression on cubes and simplices and the usual least squares estimators minimize the supremum over all possible contaminations of the expected mean square error. Additional results for extrapolation and interpolation, among other things, are discussed. For suitable loss functions optimal designs are found to have support on the extreme points of our design space. 相似文献

14.

The Prediction Sum of Squares as a General Measure for Regression Diagnostics

Nguyen T. Quan 《商业与经济统计学杂志》2013,31(4):501-504

Statistics that usually accompany the regression model do not provide insight into the quality of the data or the potential influence of the individual observations on the estimates. In this study, the Q² statistic is used as a criterion for detecting influential observations or outliers. The statistic is derived from the jackknifed residuals, the squared sum of which is generally known as the prediction sum of squares or PRESS. This article compares R ² with Q² and suggests that the latter be used as part of the data-quality check. It is shown, for two separate data sets obtained from regional cost of living and U.S. food industry studies, that in the presence of outliers the Q² statistic can be negative, because it is sensitive to the choice of regressors and the inclusion of influential observations. Once the outliers are dropped from the sample, the discrepancy between Q² and R ² values is negligible. 相似文献

15.

A note on determining the number of outliers in an exponential sample by least squares procedure

Jong-Wuu Wu 《Statistical Papers》2001,42(4):489-503

In this paper, we suggest a least squares procedure for the determination of the number of upper outliers in an exponential sample by minimizing sample mean squared error. Moreover, the method can reduce the masking or “swamping” effects. In addition, we have also found that the least squares procedure is easy and simple to compute than test test procedure T _k suggested by Zhang (1998) for determining the number of upper outliers, since Zhang (1998) need to use the complicated null distribution of T _k. Moreover, we give three practical examples and a simulated example to illustrate the procedures. Further, simulation studies are given to show the advantages of the proposed method. Finally, the proposed least squares procedure can also determine the number of upper outliers in other continuous univariate distributions (for example, Pareto, Gumbel, Weibull, etc.). Received: May 10, 1999; revised version: June 5, 2000 相似文献

16.

Identification and classification of multiple outliers,high leverage points and influential observations in linear regression

A.A.M. Nurunnabi M. Nasser A.H.M.R. Imon 《Journal of applied statistics》2016,43(3):509-525

Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects. 相似文献

17.

Robust sparse regression and tuning parameter selection via the efficient bootstrap information criteria

《Journal of Statistical Computation and Simulation》2012,82(7):1596-1607

There is currently much discussion about lasso-type regularized regression which is a useful tool for simultaneous estimation and variable selection. Although the lasso-type regularization has several advantages in regression modelling, owing to its sparsity, it suffers from outliers because of using penalized least-squares methods. To overcome this issue, we propose a robust lasso-type estimation procedure that uses the robust criteria as the loss function, imposing L₁-type penalty called the elastic net. We also introduce to use the efficient bootstrap information criteria for choosing optimal regularization parameters and a constant in outlier detection. Simulation studies and real data analysis are given to examine the efficiency of the proposed robust sparse regression modelling. We observe that our modelling strategy performs well in the presence of outliers. 相似文献

18.

Bootstrapping a time series model: some empirical results

Don Holbert Mun-Shig Son 《统计学通讯:理论与方法》2013,42(12):3669-3691

The bootstrap is a methodology for estimating standard errors. The idea is to use a Monte Carlo simulation experiment based on a nonparametric estimate of the error distribution. The main objective of this article is to demonstrate the use of the bootstrap to attach standard errors to coefficient estimates in a second-order autoregressive model fitted by least squares and maximum likelihood estimation. Additionally, a comparison of the bootstrap and the conventional methodology is made. As it turns out, the conventional asymptotic formulae (both the least squares and maximum likelihood estimates) for estimating standard errors appear to overestimate the true standard errors. But there are two problems:i. The first two observations y₁ and y₂ have been fixed, and ii. The residuals have not been inflated. After these two factors are considered in the trial and bootstrap experiment, both the conventional maximum likelihood and bootstrap estimates of the standard errors appear to be performing quite well. 相似文献

19.

Local Linear Estimation for Spatiotemporal Models Based on Least Absolute Deviation

Hongxia Wang Jinguan Lin Jinde Wang 《统计学通讯:理论与方法》2013,42(7):1508-1522

When the data contain outliers or come from population with heavy-tailed distributions, which appear very often in spatiotemporal data, the estimation methods based on least-squares (L₂) method will not perform well. More robust estimation methods are required. In this article, we propose the local linear estimation for spatiotemporal models based on least absolute deviation (L₁) and drive the asymptotic distributions of the L₁-estimators under some mild conditions imposed on the spatiotemporal process. The simulation results for two examples, with outliers and heavy-tailed distribution, respectively, show that the L₁-estimators perform better than the L₂-estimators. 相似文献

20.

Nonuniqueness of least absolute values regression

H. Leon Harter 《统计学通讯:理论与方法》2013,42(9):829-838

The question of nonuniqueness of the least absolute values (L₁) regression is discussed, and examples are given of situations where the L₁-regression is unique and where it has 2, 3 and 4 limiting positions. A method is proposed for finding a compromise regression line when the L₁ regression is not unique. Suggestions are made for further research. 相似文献