首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Leverage values are being used in regression diagnostics as measures of influential observations in the $X$-space. Detection of high leverage values is crucial because of their responsibility for misleading conclusion about the fitting of a regression model, causing multicollinearity problems, masking and/or swamping of outliers, etc. Much work has been done on the identification of single high leverage points and it is generally believed that the problem of detection of a single high leverage point has been largely resolved. But there is no general agreement among the statisticians about the detection of multiple high leverage points. When a group of high leverage points is present in a data set, mainly because of the masking and/or swamping effects the commonly used diagnostic methods fail to identify them correctly. On the other hand, the robust alternative methods can identify the high leverage points correctly but they have a tendency to identify too many low leverage points to be points of high leverages which is not also desired. An attempt has been made to make a compromise between these two approaches. We propose an adaptive method where the suspected high leverage points are identified by robust methods and then the low leverage points (if any) are put back into the estimation data set after diagnostic checking. The usefulness of our newly proposed method for the detection of multiple high leverage points is studied by some well-known data sets and Monte Carlo simulations.  相似文献   

2.
Regression analysis aims to estimate the approximate relationship between the response variable and the explanatory variables. This can be done using classical methods such as ordinary least squares. Unfortunately, these methods are very sensitive to anomalous points, often called outliers, in the data set. The main contribution of this article is to propose a new version of the Generalized M-estimator that provides good resistance against vertical outliers and bad leverage points. The advantage of this method over the existing methods is that it does not minimize the weight of the good leverage points, and this increases the efficiency of this estimator. To achieve this goal, the fixed parameters support vector regression technique is used to identify and minimize the weight of outliers and bad leverage points. The effectiveness of the proposed estimator is investigated using real and simulated data sets.  相似文献   

3.
Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects.  相似文献   

4.
Leverage values are being used in regression diagnostics as measures of unusual observations in the X-space. Detection of high leverage observations or points is crucial due to their responsibility for masking outliers. In linear regression, high leverage points (HLP) are those that stand far apart from the center (mean) of the data and hence the most extreme points in the covariate space get the highest leverage. But Hosemer and Lemeshow [Applied logistic regression, Wiley, New York, 1980] pointed out that in logistic regression, the leverage measure contains a component which can make the leverage values of genuine HLP misleadingly very small and that creates problem in the correct identification of the cases. Attempts have been made to identify the HLP based on the median distances from the mean, but since they are designed for the identification of a single high leverage point they may not be very effective in the presence of multiple HLP due to their masking (false–negative) and swamping (false–positive) effects. In this paper we propose a new method for the identification of multiple HLP in logistic regression where the suspect cases are identified by a robust group deletion technique and they are confirmed using diagnostic techniques. The usefulness of the proposed method is then investigated through several well-known examples and a Monte Carlo simulation.  相似文献   

5.
Selection of relevant predictor variables for building a model is an important problem in the multiple linear regression. Variable selection method based on ordinary least squares estimator fails to select the set of relevant variables for building a model in the presence of outliers and leverage points. In this article, we propose a new robust variable selection criterion for selection of relevant variables in the model and establish its consistency property. Performance of the proposed method is evaluated through simulation study and real data.  相似文献   

6.
7.
In linear regression, outliers and leverage points often have large influence in the model selection process. Such cases are downweighted with Mallows-type weights here, during estimation of submodel parameters by generalised M-estimation. A robust version of Mallows's Cp (Ronchetti &. Staudte, 1994) is then used to select a variety of submodels which are as informative as the full model. The methodology is illustrated on a new dataset concerning the agglomeration of alumina in Bayer precipitation.  相似文献   

8.
The penalized maximum likelihood estimator (PMLE) has been widely used for variable selection in high-dimensional data. Various penalty functions have been employed for this purpose, e.g., Lasso, weighted Lasso, or smoothly clipped absolute deviations. However, the PMLE can be very sensitive to outliers in the data, especially to outliers in the covariates (leverage points). In order to overcome this disadvantage, the usage of the penalized maximum trimmed likelihood estimator (PMTLE) is proposed to estimate the unknown parameters in a robust way. The computation of the PMTLE takes advantage of the same technology as used for PMLE but here the estimation is based on subsamples only. The breakdown point properties of the PMTLE are discussed using the notion of $d$ -fullness. The performance of the proposed estimator is evaluated in a simulation study for the classical multiple linear and Poisson linear regression models.  相似文献   

9.
Because outliers and leverage observations unduly affect the least squares regression, the identification of influential observations is considered an important and integrai part of the analysis. However, very few techniques have been developed for the residual analysis and diagnostics for the minimum sum of absolute errors, L1 regression. Although the L1 regression is more resistant to the outliers than the least squares regression, it appears that outliers (leverage) in the predictor variables may affect it. In this paper, our objective is to develop an influence measure for the L1 regression based on the likelihood displacement function. We illustrate the proposed influence measure with examples.  相似文献   

10.
This paper studies the outlier detection and robust variable selection problem in the linear regression model. The penalized weighted least absolute deviation (PWLAD) regression estimation method and the adaptive least absolute shrinkage and selection operator (LASSO) are combined to simultaneously achieve outlier detection, and robust variable selection. An iterative algorithm is proposed to solve the proposed optimization problem. Monte Carlo studies are evaluated the finite-sample performance of the proposed methods. The results indicate that the finite sample performance of the proposed methods performs better than that of the existing methods when there are leverage points or outliers in the response variable or explanatory variables. Finally, we apply the proposed methodology to analyze two real datasets.  相似文献   

11.
This paper studies outlier detection and accommodation in general spatial models including spatial autoregressive models and spatial error model as special cases. Using mean-shift and variance-weight models respectively, test statistics for multiple outliers are derived and the detecting procedures are proposed. In addition, several key diagnostic measures such as standardized residuals and leverage measure are defined in general spatial models. Outlier modified models are proposed to accommodate outliers in the data set. The performance of test statistics, including size and power, are examined via simulation studies. Three real examples are analyzed and the results show that the proposed methodology is useful for identifying and accommodating outliers in general spatial models.  相似文献   

12.
Universal kriging is a form of interpolation that takes into account the local trends in data when minimizing the error associated with the estimator. Under multivariate normality assumptions, the given predictor is the best linear unbiased predictor. but if the underlying distribution is not normal, the estimator will not be unbiased and will be vulnerable to outliers. With spatial data, it is not only the presence of outliers that may spoil the predictions, but also the boundary sites. usually corners, that tend to have high leverage. As an alternative, a weighted one-step generalized M estimator of the location parameters in a spatial linear model is proposed. It is especially recommended in the case of irregularly spaced data.  相似文献   

13.
ABSTRACT

In biomedical and epidemiological studies, gene–environment (G–E) interactions have been shown to importantly contribute to the etiology and progression of many complex diseases. Most existing approaches for identifying G–E interactions are limited by the lack of robustness against outliers/contaminations in response and predictor spaces. In this study, we develop a novel robust G–E identification approach using the trimmed regression technique under joint modelling. A robust data-driven criterion and stability selection are adopted to determine the trimmed subset which is free from both vertical outliers and leverage points. An effective penalization approach is developed to identify important G–E interactions, respecting the ‘main effects, interactions’ hierarchical structure. Extensive simulations demonstrate the better performance of the proposed approach compared to multiple alternatives. Interesting findings with superior prediction accuracy and stability are observed in the analysis of The Cancer Genome Atlas data on cutaneous melanoma and breast invasive carcinoma.  相似文献   

14.
This paper presents a one–step robust generalised M-estimation for orthogonal regression. The GM-estimator uses Schweppe weights which are based on high breakdown initial and scale estimates to downweight outliers and high leverage points. The one-step iteratively reweighted least squares procedure was used to compute the GM estimates. The robustness of the GM-estimator was shown from the results illustrated on measurements of concrete compressive strengths data.  相似文献   

15.
We consider the assessment of outliers and influential observation in non-linear measurement error models. Residuals, leverage measures and case-deletiondiagonostics are examined. The method of local influence is also applied to the models. In particular, the perturbation of measurement error variances has been found useful in assessing the adequancy of the model assumptions. A numerical example is given to illustrate the application of the diagonostics.  相似文献   

16.
A simple analytical expression is derived for leverage in ridge regression. Leverage is shown to be a monotonically decreasing function of the value of the ridge parameter. This reduction in leverage is greatest for those observations lying substantially in the direction of the minor principal axes. Thus, ridge estimation copes with outliers in regressor space by downweighting their influence. A brief illustration is provided.  相似文献   

17.
Outliers in multilevel data   总被引:2,自引:0,他引:2  
This paper offers the data analyst a range of practical procedures for dealing with outliers in multilevel data. It first develops several techniques for data exploration for outliers and outlier analysis and then applies these to the detailed analysis of outliers in two large scale multilevel data sets from educational contexts. The techniques include the use of deviance reduction, measures based on residuals, leverage values, hierarchical cluster analysis and a measure called DFITS. Outlier analysis is more complex in a multilevel data set than in, say, a univariate sample or a set of regression data, where the concept of an outlying value is straightforward. In the multilevel situation one has to consider, for example, at what level or levels a particular response is outlying, and in respect of which explanatory variables; furthermore, the treatment of a particular response at one level may affect its status or the status of other units at other levels in the model.  相似文献   

18.

The Mallows-type estimator, one of the most reasonable bounded influence estimators, often downweights leverage points regardless of the magnitude of the corresponding residual, and this could imply a loss of efficiency. In this article, we consider whether the efficiency of this bounded influence estimator could be improved by regarding both the robust x -distance and the residual size. We develop a new robust procedure based on the ideas of the Mallows-type estimator and the general robust recipe, where data been cleaned by pulling outliers towards their fitted values. Our basic idea is to formulate the robust estimation as an allocation problem, where the objective function is a Huber-type "loss" function, but the pulling resource is restricted. Using a mathematical programming technique, the pulling resource is optimally allocated to influential points <$>({x}_i, y_i)<$> with respect to residual size and given weights, <$>w({x}_i)<$>. Three previously published approaches are compared to our proposal via simulated experiments. In the case of contaminated data by regression outliers and "good" leverage points, the proposed robust estimator is a reasonable bounded influence estimator concerning both efficiency and norm of bias. In addition, the proposed approach offers the potential to establish constraints for the regression parameters and also may potentially provide insight regarding outlier detection.  相似文献   

19.
20.
Robust regression has not had a great impact on statistical practice, although all statisticians are convinced of its importance. The procedures for robust regression currently available are complex, and computer intensive. With a modification of the Gaussian paradigm, taking into consideration outliers and leverage points, we propose an iteratively weighted least squares method which gives robust fits. The procedure is illustrated by applying it on data sets which have been previously used to illustrate robust regression methods.It is hoped that this simple, effective and accessible method will find its use in statistical practice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号