Ordinary least-square (OLS) estimators for a linear model are very sensitive to unusual values in the design space or outliers among y values. Even one single atypical value may have a large effect on the parameter estimates. This article aims to review and describe some available and popular robust techniques, including some recent developed ones, and compare them in terms of breakdown point and efficiency. In addition, we also use a simulation study and a real data application to compare the performance of existing robust methods under different scenarios.  相似文献   

Generalized linear mixed models (GLMMs) are widely used to analyse non-normal response data with extra-variation, but non-robust estimators are still routinely used. We propose robust methods for maximum quasi-likelihood and residual maximum quasi-likelihood estimation to limit the influence of outlying observations in GLMMs. The estimation procedure parallels the development of robust estimation methods in linear mixed models, but with adjustments in the dependent variable and the variance component. The methods proposed are applied to three data sets and a comparison is made with the nonparametric maximum likelihood approach. When applied to a set of epileptic seizure data, the methods proposed have the desired effect of limiting the influence of outlying observations on the parameter estimates. Simulation shows that one of the residual maximum quasi-likelihood proposals has a smaller bias than those of the other estimation methods. We further discuss the equivalence of two GLMM formulations when the response variable follows an exponential family. Their extensions to robust GLMMs and their comparative advantages in modelling are described. Some possible modifications of the robust GLMM estimation methods are given to provide further flexibility for applying the method.  相似文献   

In this article, we investigate a new estimation approach for the partially linear single-index model based on modal regression method, where the non parametric function is estimated by penalized spline method. Moreover, we develop an expection maximum (EM)-type algorithm and establish the large sample properties of the proposed estimation method. A distinguishing characteristic of the newly proposed estimation is robust against outliers through introducing an additional tuning parameter which can be automatically selected using the observed data. Simulation studies and real data example are used to evaluate the finite-sample performance, and the results show that the newly proposed method works very well.  相似文献   

In regression analysis, to deal with the problem of multicollinearity, the restricted principal components regression estimator is proposed. In this paper, we compared the restricted principal components regression estimator, the principal components regression estimator, and the ordinary least-squares estimator with each other under the Pitman's closeness criterion. We showed that the restricted principal components regression estimator is always superior to the principal components regression estimator, under certain conditions the restricted principal components regression estimator is superior to the ordinary least-squares estimator under the Pitman's closeness criterion and under certain conditions the principal components regression estimator is superior to the ordinary least-squares estimator under the Pitman's closeness criterion.  相似文献   

This paper compares several Stein-like estimation methods for estimating regression parameters. The criterion function was the mean-squared error of prediction and the parameter of interest was the mean of the response variable at the sampled values of the control variables. Large sample simulation techniques were used to evaluate the mean-squared error of the predictions. The parameters of interest were varied systematically over wide ranges.  相似文献   

Mixtures of factor analyzers (MFAs) have been popularly used to cluster the high-dimensional data. However, the traditional estimation method is based on the normality assumptions of random terms and thus is sensitive to outliers. In this article, we introduce a robust estimation procedure of MFAs using the trimmed likelihood estimator. We use a simulation study and a real data application to demonstrate the robustness of the trimmed estimation procedure and compare it with the traditional normality-based maximum likelihood estimate.  相似文献   

In this paper, a new method for robust principal component analysis (PCA) is proposed. PCA is a widely used tool for dimension reduction without substantial loss of information. However, the classical PCA is vulnerable to outliers due to its dependence on the empirical covariance matrix. To avoid such weakness, several alternative approaches based on robust scatter matrix were suggested. A popular choice is ROBPCA that combines projection pursuit ideas with robust covariance estimation via variance maximization criterion. Our approach is based on the fact that PCA can be formulated as a regression-type optimization problem, which is the main difference from the previous approaches. The proposed robust PCA is derived by substituting square loss function with a robust penalty function, Huber loss function. A practical algorithm is proposed in order to implement an optimization computation, and furthermore, convergence properties of the algorithm are investigated. Results from a simulation study and a real data example demonstrate the promising empirical properties of the proposed method.  相似文献   

This paper proposes robust regression to solve the problem of outliers in seemingly unrelated regression (SUR) models. The authors present an adaptation of S‐estimators to SUR models. S‐estimators are robust, have a high breakdown point and are much more efficient than other robust regression estimators commonly used in practice. Furthermore, modifications to Ruppert's algorithm allow a fast evaluation of them in this context. The classical example of U.S. corporations is revisited, and it appears that the procedure gives an interesting insight into the problem.  相似文献   

This paper studies robust estimation of multivariate regression model using kernel weighted local linear regression. A robust estimation procedure is proposed for estimating the regression function and its partial derivatives. The proposed estimators are jointly asymptotically normal and attain nonparametric optimal convergence rate. One-step approximations to the robust estimators are introduced to reduce computational burden. The one-step local M-estimators are shown to achieve the same efficiency as the fully iterative local M-estimators as long as the initial estimators are good enough. The proposed estimators inherit the excellent edge-effect behavior of the local polynomial methods in the univariate case and at the same time overcome the disadvantages of the local least-squares based smoothers. Simulations are conducted to demonstrate the performance of the proposed estimators. Real data sets are analyzed to illustrate the practical utility of the proposed methodology. This work was supported by the National Natural Science Foundation of China (Grant No. 10471006).  相似文献   

The author presents a robust F-test for comparing nested linear models. It is suggested that the approach will be attractive to practitioners because it is based on the familiar F-statistic and corresponds to the common practice of reporting F-statistics after removing obvious outliers. It is calibrated in terms of a real parameter that can be directly interpreted as the willingness of the data analyst to remove observations, and the sensitivity of the F-statistic to this parameter is easily examined. The procedure is evaluated with a simulation study where a scale mixture distribution is used to generate outliers. The procedure is also applied to some data where the occurrence of an outlier is confounded with the significance of a regression term. This provides a comparison of two competing models for the data: one removing an outlier and the other including an additional regression term instead.  相似文献   

The k-means algorithm is one of the most common non hierarchical methods of clustering. It aims to construct clusters in order to minimize the within cluster sum of squared distances. However, as most estimators defined in terms of objective functions depending on global sums of squares, the k-means procedure is not robust with respect to atypical observations in the data. Alternative techniques have thus been introduced in the literature, e.g., the k-medoids method. The k-means and k-medoids methodologies are particular cases of the generalized k-means procedure. In this article, focus is on the error rate these clustering procedures achieve when one expects the data to be distributed according to a mixture distribution. Two different definitions of the error rate are under consideration, depending on the data at hand. It is shown that contamination may make one of these two error rates decrease even under optimal models. The consequence of this will be emphasized with the comparison of influence functions and breakdown points of these error rates.  相似文献   

This paper considers the sensitivity of chance constrained linear programming solutions where the coefficients of the left-hand side of a constraint function are estimated from a sample using multiple linear regression. The modified nonlinear constraint provides considerable assurance that the true, but unknown, stochastic linear constraint will be satisfied at a given level of probability for the conditions of the simulation herein. Ordinary least squares and least absolute value regression criteria are considered along with normal, uniform and double exponential distributions of error.  相似文献   

Principal components regression (PCR) is used in resolving the multicollinearity problem but specification bias occurs due to the selection only of the important principal components to be included resulting in the deterioration of predictive ability of the model. We propose the PCR in a nonparametric framework to address the multicollinearity problem while minimizing the specification bias that affects predictive ability of the model. The simulation study illustrated that nonparametric PCR addresses the multicollinearity problem while retaining higher predictive ability relative to parametric principal components regression model.  相似文献   

A method for robustness in linear models is to assume that there is a mixture of standard and outlier observations with a different error variance for each class. For generalised linear models (GLMs) the mixture model approach is more difficult as the error variance for many distributions has a fixed relationship to the mean. This model is extended to GLMs by changing the classes to one where the standard class is a standard GLM and the outlier class which is an overdispersed GLM achieved by including a random effect term in the linear predictor. The advantages of this method are it can be extended to any model with a linear predictor, and outlier observations can be easily identified. Using simulation the model is compared to an M-estimator, and found to have improved bias and coverage. The method is demonstrated on three examples.  相似文献   

Partial linear varying coefficient models are often used in real data analysis for a good balance between flexibility and parsimony. In this paper, we propose a robust adaptive model selection method based on the rank regression, which can do simultaneous coefficient estimation and three types of selections, i.e., varying and constant effects selection, relevant variable selection. The new method has superiority in robustness and efficiency by inheriting the advantage of the rank regression approach. Furthermore, consistency in the three types of selections and oracle property in estimation are established as well. Simulation studies also confirm our method.  相似文献   

The authors propose a new class of robust estimators for the parameters of a regression model in which the distribution of the error terms belongs to a class of exponential families including the log‐gamma distribution. These estimates, which are a natural extension of the MM‐estimates for ordinary regression, may combine simultaneously high asymptotic efficiency and a high breakdown point. The authors prove the consistency and derive the asymptotic normal distribution of these estimates. A Monte Carlo study allows them to assess the efficiency and robustness of these estimates for finite samples.  相似文献   

This article investigates the relevance of considering a large number of macroeconomic indicators to forecast the complete distribution of a variable. The baseline time series model is a semiparametric specification based on the quantile autoregressive (QAR) model that assumes that the quantiles depend on the lagged values of the variable. We then augment the time series model with macroeconomic information from a large dataset by including principal components or a subset of variables selected by LASSO. We forecast the distribution of the h-month growth rate for four economic variables from 1975 to 2011 and evaluate the forecast accuracy relative to a stochastic volatility model using the quantile score. The results for the output and employment measures indicate that the multivariate models outperform the time series forecasts, in particular at long horizons and in tails of the distribution, while for the inflation variables the improved performance occurs mostly at the 6-month horizon. We also illustrate the practical relevance of predicting the distribution by considering forecasts at three dates during the last recession.  相似文献   

For longitudinal data, the within-subject dependence structure and covariance parameters may be of practical and theoretical interests. The estimation of covariance parameters has received much attention and been studied mainly in the framework of generalized estimating equations (GEEs). The GEEs method, however, is sensitive to outliers. In this paper, an alternative set of robust generalized estimating equations for both the mean and covariance parameters are proposed in the partial linear model for longitudinal data. The asymptotic properties of the proposed estimators of regression parameters, non-parametric function and covariance parameters are obtained. Simulation studies are conducted to evaluate the performance of the proposed estimators under different contaminations. The proposed method is illustrated with a real data analysis.  相似文献   

We consider the problem of the sequential choice of design points in an approximately linear model. It is assumed that the fitted linear model is only approximately correct, in that the true response function contains a nonrandom, unknown term orthogonal to the fitted response. We also assume that the parameters are estimated by M-estimation. The goal is to choose the next design point in such a way as to minimize the resulting integrated squared bias of the estimated response, to order n-1. Explicit applications to analysis of variance and regression are given. In a simulation study the sequential designs compare favourably with some fixed-sample-size designs which are optimal for the true response to which the sequential designs must adapt.  相似文献   

