首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The identification of influential observations has drawn a great deal of attention in regression diagnostics. Most of these identification techniques are based on single case deletion and among them DFFITS has become very popular with the statisticians. But this technique along with all other single case diagnostics may be ineffective in the presence of multiple influential observations. In this paper we develop a generalized version of DFFITS based on group deletion and then propose a new technique to identify multiple influential observations using this. The advantage of using the proposed method in the identification of multiple influential cases is then investigated through several well-referred data sets.  相似文献   

2.
In this paper, two new multiple influential observation detection methods, GCD.GSPR and mCD*, are introduced for logistic regression. The proposed diagnostic measures are compared with the generalized difference in fits (GDFFITS) and the generalized squared difference in beta (GSDFBETA), which are multiple influential diagnostics. The simulation study is conducted with one, two and five independent variable logistic regression models. The performance of the diagnostic measures is examined for a single contaminated independent variable for each model and in the case where all the independent variables are contaminated with certain contamination rates and intensity. In addition, the performance of the diagnostic measures is compared in terms of the correct identification rate and swamping rate via a frequently referred to data set in the literature.  相似文献   

3.
Since the seminal paper by Cook (1977) in which he introduced Cook's distance, the identification of influential observations has received a great deal of interest and extensive investigation in linear regression. It is well documented that most of the popular diagnostic measures that are based on single-case deletion can mislead the analysis in the presence of multiple influential observations because of the well-known masking and/or swamping phenomena. Atkinson (1981) proposed a modification of Cook's distance. In this paper we propose a further modification of the Cook's distance for the identification of a single influential observation. We then propose new measures for the identification of multiple influential observations, which are not affected by the masking and swamping problems. The efficiency of the new statistics is presented through several well-known data sets and a simulation study.  相似文献   

4.
Leverage values are being used in regression diagnostics as measures of unusual observations in the X-space. Detection of high leverage observations or points is crucial due to their responsibility for masking outliers. In linear regression, high leverage points (HLP) are those that stand far apart from the center (mean) of the data and hence the most extreme points in the covariate space get the highest leverage. But Hosemer and Lemeshow [Applied logistic regression, Wiley, New York, 1980] pointed out that in logistic regression, the leverage measure contains a component which can make the leverage values of genuine HLP misleadingly very small and that creates problem in the correct identification of the cases. Attempts have been made to identify the HLP based on the median distances from the mean, but since they are designed for the identification of a single high leverage point they may not be very effective in the presence of multiple HLP due to their masking (false–negative) and swamping (false–positive) effects. In this paper we propose a new method for the identification of multiple HLP in logistic regression where the suspect cases are identified by a robust group deletion technique and they are confirmed using diagnostic techniques. The usefulness of the proposed method is then investigated through several well-known examples and a Monte Carlo simulation.  相似文献   

5.
Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects.  相似文献   

6.
ABSTRACT

In high-dimensional regression, the presence of influential observations may lead to inaccurate analysis results so that it is a prime and important issue to detect these unusual points before statistical regression analysis. Most of the traditional approaches are, however, based on single-case diagnostics, and they may fail due to the presence of multiple influential observations that suffer from masking effects. In this paper, an adaptive multiple-case deletion approach is proposed for detecting multiple influential observations in the presence of masking effects in high-dimensional regression. The procedure contains two stages. Firstly, we propose a multiple-case deletion technique, and obtain an approximate clean subset of the data that is presumably free of influential observations. To enhance efficiency, in the second stage, we refine the detection rule. Monte Carlo simulation studies and a real-life data analysis investigate the effective performance of the proposed procedure.  相似文献   

7.
Because outliers and leverage observations unduly affect the least squares regression, the identification of influential observations is considered an important and integrai part of the analysis. However, very few techniques have been developed for the residual analysis and diagnostics for the minimum sum of absolute errors, L1 regression. Although the L1 regression is more resistant to the outliers than the least squares regression, it appears that outliers (leverage) in the predictor variables may affect it. In this paper, our objective is to develop an influence measure for the L1 regression based on the likelihood displacement function. We illustrate the proposed influence measure with examples.  相似文献   

8.
We introduce and discuss three important regression diagnostics: leverage, Studentized residuals, and DFFITS. We then develop two approaches to bounded-influence robust regression based on these diagnostics. The methods are illustrated on a data set using a simple MINITAB program.  相似文献   

9.
The Liu estimator has been developed as an alternative to the ordinary least squares estimator in the presence of collinearity among the elements of regressors in linear regression models. We present the DFFITS and different versions of the Cook distance analogous to the ones given for the ordinary linear regression models of each individual observation on the Liu estimates. We suggest a version of the Cook distance based on one-step approximation. The mean shift outlier model for the Liu regression has also been investigated. Moreover, using the Sherman-Morrison-Woodbury theorem, we find approximate versions of the DFFITS and the Cook distance. The proposed diagnostics are evaluated on two data sets and yield promising results.  相似文献   

10.
A general technique for assessing leverage and influential observations in Generalized Linear Models is described. The procedure takes the form of Half-Normal plots with envelopes derived from simulation to enhance overall assessment of the model. This procedure of assessment is more informative and provides additional insight compared with procedures based on the largest sample leverage and influence statistics. Application of the method is illustrated with an example in logistic regression.  相似文献   

11.
High leverage points can induce or disrupt multicollinearity patterns in data. Observations responsible for this problem are generally known as collinearity-influential observations. A significant amount of published work on the identification of collinearity-influential observations exists; however, we show in this article that all commonly used detection techniques display greatly reduced sensitivity in the presence of multiple high leverage collinearity-influential observations. We propose a new measure based on a diagnostic robust group deletion approach. Some practical cutoff points for existing and developed diagnostics measures are also introduced. Numerical examples and simulation results show that the proposed measure provides significant improvement over the existing measures.  相似文献   

12.
The use of logistic regression modeling has seen a great deal of attention in the literature in recent years. This includes all aspects of the logistic regression model including the identification of outliers. A variety of methods for the identification of outliers, such as the standardized Pearson residuals, are now available in the literature. These methods, however, are successful only if the data contain a single outlier. In the presence of multiple outliers in the data, which is often the case in practice, these methods fail to detect the outliers. This is due to the well-known problems of masking (false negative) and swamping (false positive) effects. In this article, we propose a new method for the identification of multiple outliers in logistic regression. We develop a generalized version of standardized Pearson residuals based on group deletion and then propose a technique for identifying multiple outliers. The performance of the proposed method is then investigated through several examples.  相似文献   

13.
Logistic regression is frequently used for classifying observations into two groups. Unfortunately there are often outlying observations in a data set and these might affect the estimated model and the associated classification error rate. In this paper, the authors study the effect of observations in the training sample on the error rate by deriving influence functions. They obtain a general expression for the influence function of the error rate, and they compute it for the maximum likelihood estimator as well as for several robust logistic discrimination procedures. Besides being of interest in their own right, the influence functions are also used to derive asymptotic classification efficiencies of different logistic discrimination rules. The authors also show how influential points can be detected by means of a diagnostic plot based on the values of the influence function  相似文献   

14.
Many sampling problems from multiple populations can be considered under the semiparametric framework of the biased, or weighted, sampling model. Included under this framework is logistic regression under case–control sampling. For any model, atypical observations can greatly influence the maximum likelihood estimate of the parameters. Several robust alternatives have been proposed for the special case of logistic regression. However, some current techniques can exhibit poor behavior in many common situations. In this paper a new family of procedures are constructed to estimate the parameters in the semiparametric biased sampling model. The procedures incorporate a minimum distance approach, but are instead based on characteristic functions. The estimators can also be represented as the minimizers of quadratic forms in simple residuals, thus yielding straightforward computation. For the case of logistic regression, the resulting estimators are shown to be competitive with the existing robust approaches in terms of both robustness and efficiency, while maintaining affine equivariance. The approach is developed under the case–control sampling scheme, yet is shown to be applicable under prospective sampling logistic regression as well.  相似文献   

15.
The detection of outliers and influential observations has received a great deal of attention in the statistical literature in the context of least-squares (LS) regression. However, the explanatory variables can be correlated with each other and alternatives to LS come out to address outliers/influential observations and multicollinearity, simultaneously. This paper proposes new influence measures based on the affine combination type regression for the detection of influential observations in the linear regression model when multicollinearity exists. Approximate influence measures are also proposed for the affine combination type regression. Since the affine combination type regression includes the ridge, the Liu and the shrunken regressions as special cases, influence measures under the ridge, the Liu and the shrunken regressions are also examined to see the possible effect that multicollinearity can have on the influence of an observation. The Longley data set is given illustrating the influence measures in affine combination type regression and also in ridge, Liu and shrunken regressions so that the performance of different biased regressions on detecting and assessing the influential observations is examined.  相似文献   

16.
Leverage values are being used in regression diagnostics as measures of influential observations in the $X$-space. Detection of high leverage values is crucial because of their responsibility for misleading conclusion about the fitting of a regression model, causing multicollinearity problems, masking and/or swamping of outliers, etc. Much work has been done on the identification of single high leverage points and it is generally believed that the problem of detection of a single high leverage point has been largely resolved. But there is no general agreement among the statisticians about the detection of multiple high leverage points. When a group of high leverage points is present in a data set, mainly because of the masking and/or swamping effects the commonly used diagnostic methods fail to identify them correctly. On the other hand, the robust alternative methods can identify the high leverage points correctly but they have a tendency to identify too many low leverage points to be points of high leverages which is not also desired. An attempt has been made to make a compromise between these two approaches. We propose an adaptive method where the suspected high leverage points are identified by robust methods and then the low leverage points (if any) are put back into the estimation data set after diagnostic checking. The usefulness of our newly proposed method for the detection of multiple high leverage points is studied by some well-known data sets and Monte Carlo simulations.  相似文献   

17.
The detection of influential observations on the estimation of the dimension reduction subspace returned by Sliced Inverse Regression (SIR) is considered. Although there are many measures to detect influential observations in related methods such as multiple linear regression, there has been little development in this area with respect to dimension reduction. One particular influence measure for a version of SIR is examined and it is shown, via simulation and example, how this may be used to detect influential observations in practice.  相似文献   

18.
19.
The local influence method introduced by Cook is adapted to multivariate normal data for the purpose of detecting outliers. The method allows simultaneous perturbations on all observations, so that it can identify multiple outliers. An illustrative example is given to show the e ectiveness of the method for the identification of influential observations.  相似文献   

20.
A new approach is presented for testing independence in contingency tables with clustered observations. The approach is based on the framework of generalized linear mixed models. Under the multinomial logistic link function, the category counts are modelled with random cluster effects and a modified likelihood ratio statistic is used for testing independence. The method is applicable to multi-way tables, and can accommodate multiple levels of clustering. It is illustrated using a benchmark dataset.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号