Identification of multiple high leverage points in logistic regression |
| |
Authors: | A.H.M. Rahmatullah Imon Ali S. Hadi |
| |
Affiliation: | 1. Department of Mathematical Sciences, Ball State University, Muncie, IN, USA;2. Department of Mathematics, The American University in Cairo, Cairo, Egypt |
| |
Abstract: | ![]() Leverage values are being used in regression diagnostics as measures of unusual observations in the X-space. Detection of high leverage observations or points is crucial due to their responsibility for masking outliers. In linear regression, high leverage points (HLP) are those that stand far apart from the center (mean) of the data and hence the most extreme points in the covariate space get the highest leverage. But Hosemer and Lemeshow [Applied logistic regression, Wiley, New York, 1980] pointed out that in logistic regression, the leverage measure contains a component which can make the leverage values of genuine HLP misleadingly very small and that creates problem in the correct identification of the cases. Attempts have been made to identify the HLP based on the median distances from the mean, but since they are designed for the identification of a single high leverage point they may not be very effective in the presence of multiple HLP due to their masking (false–negative) and swamping (false–positive) effects. In this paper we propose a new method for the identification of multiple HLP in logistic regression where the suspect cases are identified by a robust group deletion technique and they are confirmed using diagnostic techniques. The usefulness of the proposed method is then investigated through several well-known examples and a Monte Carlo simulation. |
| |
Keywords: | logistic regression covariates high leverage points masking swamping group deletion robust regression deletion median distance from the median Monte Carlo simulation |
|
|