首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
The identification of influential observations in logistic regression has drawn a great deal of attention in recent years. Most of the available techniques like Cook's distance and difference of fits (DFFITS) are based on single-case deletion. But there is evidence that these techniques suffer from masking and swamping problems and consequently fail to detect multiple influential observations. In this paper, we have developed a new measure for the identification of multiple influential observations in logistic regression based on a generalized version of DFFITS. The advantage of the proposed method is then investigated through several well-referred data sets and a simulation study.  相似文献   

Since the seminal paper by Cook (1977) in which he introduced Cook's distance, the identification of influential observations has received a great deal of interest and extensive investigation in linear regression. It is well documented that most of the popular diagnostic measures that are based on single-case deletion can mislead the analysis in the presence of multiple influential observations because of the well-known masking and/or swamping phenomena. Atkinson (1981) proposed a modification of Cook's distance. In this paper we propose a further modification of the Cook's distance for the identification of a single influential observation. We then propose new measures for the identification of multiple influential observations, which are not affected by the masking and swamping problems. The efficiency of the new statistics is presented through several well-known data sets and a simulation study.  相似文献   

We present influence diagnostics for linear measurement error models with stochastic linear restrictions using the corrected likelihood of Nakamura in 1990. The case deletion and mean shift outlier models are developed to identify outlying and influential observations. We derive a corrected score test statistic for outlier detection based on mean shift outlier models. The analogs of Cook's distance and likelihood distance are proposed to determine influential observations based on case deletion models. A parametric bootstrap procedure is used to obtain empirical distributions of the test statistics and a simulation study has been used to evaluate the performance of the proposed estimators based on the mean squares error criterion and the score test statistic. Finally, a numerical example is given to illustrate the theoretical results.  相似文献   


In high-dimensional regression, the presence of influential observations may lead to inaccurate analysis results so that it is a prime and important issue to detect these unusual points before statistical regression analysis. Most of the traditional approaches are, however, based on single-case diagnostics, and they may fail due to the presence of multiple influential observations that suffer from masking effects. In this paper, an adaptive multiple-case deletion approach is proposed for detecting multiple influential observations in the presence of masking effects in high-dimensional regression. The procedure contains two stages. Firstly, we propose a multiple-case deletion technique, and obtain an approximate clean subset of the data that is presumably free of influential observations. To enhance efficiency, in the second stage, we refine the detection rule. Monte Carlo simulation studies and a real-life data analysis investigate the effective performance of the proposed procedure.  相似文献   

We introduce and discuss three important regression diagnostics: leverage, Studentized residuals, and DFFITS. We then develop two approaches to bounded-influence robust regression based on these diagnostics. The methods are illustrated on a data set using a simple MINITAB program.  相似文献   

High leverage points can induce or disrupt multicollinearity patterns in data. Observations responsible for this problem are generally known as collinearity-influential observations. A significant amount of published work on the identification of collinearity-influential observations exists; however, we show in this article that all commonly used detection techniques display greatly reduced sensitivity in the presence of multiple high leverage collinearity-influential observations. We propose a new measure based on a diagnostic robust group deletion approach. Some practical cutoff points for existing and developed diagnostics measures are also introduced. Numerical examples and simulation results show that the proposed measure provides significant improvement over the existing measures.  相似文献   

In this paper, two new multiple influential observation detection methods, GCD.GSPR and mCD*, are introduced for logistic regression. The proposed diagnostic measures are compared with the generalized difference in fits (GDFFITS) and the generalized squared difference in beta (GSDFBETA), which are multiple influential diagnostics. The simulation study is conducted with one, two and five independent variable logistic regression models. The performance of the diagnostic measures is examined for a single contaminated independent variable for each model and in the case where all the independent variables are contaminated with certain contamination rates and intensity. In addition, the performance of the diagnostic measures is compared in terms of the correct identification rate and swamping rate via a frequently referred to data set in the literature.  相似文献   

Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects.  相似文献   

Leverage values are being used in regression diagnostics as measures of unusual observations in the X-space. Detection of high leverage observations or points is crucial due to their responsibility for masking outliers. In linear regression, high leverage points (HLP) are those that stand far apart from the center (mean) of the data and hence the most extreme points in the covariate space get the highest leverage. But Hosemer and Lemeshow [Applied logistic regression, Wiley, New York, 1980] pointed out that in logistic regression, the leverage measure contains a component which can make the leverage values of genuine HLP misleadingly very small and that creates problem in the correct identification of the cases. Attempts have been made to identify the HLP based on the median distances from the mean, but since they are designed for the identification of a single high leverage point they may not be very effective in the presence of multiple HLP due to their masking (false–negative) and swamping (false–positive) effects. In this paper we propose a new method for the identification of multiple HLP in logistic regression where the suspect cases are identified by a robust group deletion technique and they are confirmed using diagnostic techniques. The usefulness of the proposed method is then investigated through several well-known examples and a Monte Carlo simulation.  相似文献   

To assess the influence of observations on the parameter estimates, case deletion diagnostics are commonly used in linear regression models. For linear models with correlated errors we study the influence of observations on testing a linear hypothesis using single and multiple case deletions. The change in likelihood ratio test and F test theoretically is derived and it is shown these tests to be completely determined by two proposed generalized externally studentized residuals. An illustrative example of a real data set is also reported.  相似文献   

In this article, we propose a bivariate long-term distribution based on the Farlie-Gumbel-Morgenstern copula model. The proposed model allows for the presence of censored data and covariates. For inferential purposes, a Bayesian approach via Markov Chain Monte Carlo (MCMC) were considered. Further, some discussions on the model selection criteria are given. In order to examine outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. The newly developed procedures are illustrated on artificial and real data.  相似文献   

Leverage values are being used in regression diagnostics as measures of influential observations in the $X$-space. Detection of high leverage values is crucial because of their responsibility for misleading conclusion about the fitting of a regression model, causing multicollinearity problems, masking and/or swamping of outliers, etc. Much work has been done on the identification of single high leverage points and it is generally believed that the problem of detection of a single high leverage point has been largely resolved. But there is no general agreement among the statisticians about the detection of multiple high leverage points. When a group of high leverage points is present in a data set, mainly because of the masking and/or swamping effects the commonly used diagnostic methods fail to identify them correctly. On the other hand, the robust alternative methods can identify the high leverage points correctly but they have a tendency to identify too many low leverage points to be points of high leverages which is not also desired. An attempt has been made to make a compromise between these two approaches. We propose an adaptive method where the suspected high leverage points are identified by robust methods and then the low leverage points (if any) are put back into the estimation data set after diagnostic checking. The usefulness of our newly proposed method for the detection of multiple high leverage points is studied by some well-known data sets and Monte Carlo simulations.  相似文献   

The use of logistic regression modeling has seen a great deal of attention in the literature in recent years. This includes all aspects of the logistic regression model including the identification of outliers. A variety of methods for the identification of outliers, such as the standardized Pearson residuals, are now available in the literature. These methods, however, are successful only if the data contain a single outlier. In the presence of multiple outliers in the data, which is often the case in practice, these methods fail to detect the outliers. This is due to the well-known problems of masking (false negative) and swamping (false positive) effects. In this article, we propose a new method for the identification of multiple outliers in logistic regression. We develop a generalized version of standardized Pearson residuals based on group deletion and then propose a technique for identifying multiple outliers. The performance of the proposed method is then investigated through several examples.  相似文献   

The Liu estimator has been developed as an alternative to the ordinary least squares estimator in the presence of collinearity among the elements of regressors in linear regression models. We present the DFFITS and different versions of the Cook distance analogous to the ones given for the ordinary linear regression models of each individual observation on the Liu estimates. We suggest a version of the Cook distance based on one-step approximation. The mean shift outlier model for the Liu regression has also been investigated. Moreover, using the Sherman-Morrison-Woodbury theorem, we find approximate versions of the DFFITS and the Cook distance. The proposed diagnostics are evaluated on two data sets and yield promising results.  相似文献   

This paper examines local influence assessment in generalized autoregressive conditional heteroscesdasticity models with Gaussian and Student-t errors, where influence is examined via the likelihood displacement. The analysis of local influence is discussed under three perturbation schemes: data perturbation, innovative model perturbation and additive model perturbation. For each case, expressions for slope and curvature diagnostics are derived. Monte Carlo experiments are presented to determine the threshold values for locating influential observations. The empirical study of daily returns of the New York Stock Exchange composite index shows that local influence analysis is a useful technique for detecting influential observations; most of the observations detected as influential are associated with historical shocks in the market. Finally, based on this empirical study and the analysis of simulated data, some advice is given on how to use the discussed methodology.  相似文献   


Constrained general linear models (CGLMs) have wide applications in practice. Similar to other data analysis, the identification of influential observations that may be potential outliers is an important step beyond in the CGLMs. We develop multiple case-deletion diagnostics for detecting influential observations in the CGLMs. The diagnostics are functions of basic building blocks: studentized residuals, error contrast matrix, and the inverse of the response variable covariance matrix. The basic building blocks are computed only once from the complete data analysis and provide information on the influence of the data on different aspects of the model fit. Computational formulas are given which make the procedures feasible. An illustrative example with a real data set is also reported.  相似文献   

Normality and independence of error terms are typical assumptions for partial linear models. However, these assumptions may be unrealistic in many fields, such as economics, finance and biostatistics. In this paper, a Bayesian analysis for partial linear model with first-order autoregressive errors belonging to the class of the scale mixtures of normal distributions is studied in detail. The proposed model provides a useful generalization of the symmetrical linear regression model with independent errors, since the distribution of the error term covers both correlated and thick-tailed distributions, and has a convenient hierarchical representation allowing easy implementation of a Markov chain Monte Carlo scheme. In order to examine the robustness of the model against outlying and influential observations, a Bayesian case deletion influence diagnostics based on the Kullback–Leibler (K–L) divergence is presented. The proposed method is applied to monthly and daily returns of two Chilean companies.  相似文献   

In regression, detecting anomalous observations is a significant step for model-building process. Various influence measures based on different motivational arguments are designed to measure the influence of observations through different aspects of various regression models. The presence of influential observations in the data is complicated by the existence of multicollinearity. The purpose of this paper is to assess the influence of observations in the Liu [9] and modified Liu [15] estimators by using the method of approximate case deletion formulas suggested by Walker and Birch [14]. A numerical example using a real data set used by Longley [10] and a Monte Carlo simulation are given to illustrate the theoretical results.  相似文献   

In this paper, we define a multiple cases deletion model (MCDM) in linear measurement error models (LMEMs). Then, by using the corrected score method of Nakamura (1990), the estimation of parameters is obtained. Furthermore, Based on MCDM, we provide computationally inexpensive deletion diagnostic tools for LMEMs. An example illustrates that our method is useful for diagnosing influential subsets of observations.  相似文献   

Local Influence in Generalized Estimating Equations   总被引:1,自引:0,他引:1  
Abstract.  We investigate the influence of subjects or observations on regression coefficients of generalized estimating equations (GEEs) using local influence. The GEE approach does not require the full multivariate distribution of the response vector. We extend the likelihood displacement to a quasi-likelihood displacement, and propose local influence diagnostics under several perturbation schemes. An illustrative example in GEEs is given and we compare the results using the local influence and deletion methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号