首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 179 毫秒
1.
In this article, we propose two novel diagnostic measures for the deletion of influential observations for regression parameters in the setting of generalized linear models. The proposed diagnostic methods are capable for detecting the influential observations under model misspecification, as long as the true underlying distributions have finite second moments.More specifically, it is demonstrated that the Poisson likelihood function can be properly adjusted to become asymptotically valid for practically all underlying discrete distributions. The adjusted Poisson regression model that achieves the robustness property is presented. Simulation studies and an illustration are performed to demonstrate the efficacy of the two novel diagnostic procedures.  相似文献   

2.
The identification of influential observations in logistic regression has drawn a great deal of attention in recent years. Most of the available techniques like Cook's distance and difference of fits (DFFITS) are based on single-case deletion. But there is evidence that these techniques suffer from masking and swamping problems and consequently fail to detect multiple influential observations. In this paper, we have developed a new measure for the identification of multiple influential observations in logistic regression based on a generalized version of DFFITS. The advantage of the proposed method is then investigated through several well-referred data sets and a simulation study.  相似文献   

3.
In fitting regression model, one or more observations may have substantial effects on estimators. These unusual observations are precisely detected by a new diagnostic measure, Pena's statistic. In this article, we introduce a type of Pena's statistic for each point in Liu regression. Using the forecast change property, we simplify the Pena's statistic in a numerical sense. It is found that the simplified Pena's statistic behaves quite well as far as detection of influential observations is concerned. We express Pena's statistic in terms of the Liu leverages and residuals. The normality of this statistic is also discussed and it is demonstrated that it can identify a subset of high Liu leverage outliers. For numerical evaluation, simulated studies are given and a real data set has been analysed for illustration.  相似文献   

4.
We consider the asymptotic distribution of divergence-based influence measures which are an extension for polytomous logistic regression of an influence measure proposed in Johnson (1985 Johnson, W.O. (1985). Influence measures for logistic regression: Another point of view. Biometrika 72: 5965.[Crossref], [Web of Science ®] [Google Scholar]), for binary logistic regression. A numerical example compares the classical Cook’s distance with the divergence based influence measures.  相似文献   

5.
ABSTRACT

Constrained general linear models (CGLMs) have wide applications in practice. Similar to other data analysis, the identification of influential observations that may be potential outliers is an important step beyond in the CGLMs. We develop multiple case-deletion diagnostics for detecting influential observations in the CGLMs. The diagnostics are functions of basic building blocks: studentized residuals, error contrast matrix, and the inverse of the response variable covariance matrix. The basic building blocks are computed only once from the complete data analysis and provide information on the influence of the data on different aspects of the model fit. Computational formulas are given which make the procedures feasible. An illustrative example with a real data set is also reported.  相似文献   

6.
The detection of influential observations on the estimation of the dimension reduction subspace returned by Sliced Inverse Regression (SIR) is considered. Although there are many measures to detect influential observations in related methods such as multiple linear regression, there has been little development in this area with respect to dimension reduction. One particular influence measure for a version of SIR is examined and it is shown, via simulation and example, how this may be used to detect influential observations in practice.  相似文献   

7.
ABSTRACT

Modeling diagnostics assess models by means of a variety of criteria. Each criterion typically performs its evaluation upon a specific inferential objective. For instance, the well-known DFBETAS in linear regression models are a modeling diagnostic which is applied to discover the influential cases in fitting a model. To facilitate the evaluation of generalized linear mixed models (GLMM), we develop a diagnostic for detecting influential cases based on the information complexity (ICOMP) criteria for detecting influential cases which substantially affect the model selection criterion ICOMP. In a given model, the diagnostic compares the ICOMP criterion between the full data set and a case-deleted data set. The computational formula of the ICOMP criterion is evaluated using the Fisher information matrix. A simulation study is accomplished and a real data set of cancer cells is analyzed using the logistic linear mixed model for illustrating the effectiveness of the proposed diagnostic in detecting the influential cases.  相似文献   

8.
The identification of influential observations has drawn a great deal of attention in regression diagnostics. Most of these identification techniques are based on single case deletion and among them DFFITS has become very popular with the statisticians. But this technique along with all other single case diagnostics may be ineffective in the presence of multiple influential observations. In this paper we develop a generalized version of DFFITS based on group deletion and then propose a new technique to identify multiple influential observations using this. The advantage of using the proposed method in the identification of multiple influential cases is then investigated through several well-referred data sets.  相似文献   

9.
It sometimes occurs that one or more components of the data exert a disproportionate influence on the model estimation. We need a reliable tool for identifying such troublesome cases in order to decide either eliminate from the sample, when the data collect was badly realized, or otherwise take care on the use of the model because the results could be affected by such components. Since a measure for detecting influential cases in linear regression setting was proposed by Cook [Detection of influential observations in linear regression, Technometrics 19 (1977), pp. 15–18.], apart from the same measure for other models, several new measures have been suggested as single-case diagnostics. For most of them some cutoff values have been recommended (see [D.A. Belsley, E. Kuh, and R.E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, 2nd ed., John Wiley & Sons, New York, Chichester, Brisban, (2004).], for instance), however the lack of a quantile type cutoff for Cook's statistics has induced the analyst to deal only with index plots as worthy diagnostic tools. Focussed on logistic regression, the aim of this paper is to provide the asymptotic distribution of Cook's distance in order to look for a meaningful cutoff point for detecting influential and leverage observations.  相似文献   

10.
Various diagnostic statistics have been proposed to help identify cases that markedly affect, or influence, the features of a fitted linear regression model. Once influential cases are found, decisions can be made regarding their worth in the model building process. Since a subject data set may contain both singly influential cases and influential multiple case subsets, the capability to assess the joint influence of cases is needed for a complete analysis. The aim of this work is to briefly review Cook’s distance measure for multiple cases, an effective diagnostic for this purpose, and present a method using it to search for influential multiple case subsets. The method is applied in two example analyses by way of a MINITAB Statistical Software macro.  相似文献   

11.
Goodness-of-fit Tests for GEE with Correlated Binary Data   总被引:3,自引:0,他引:3  
The marginal logistic regression, in combination with GEE, is an increasingly important method in dealing with correlated binary data. As for independent binary data, when the number of possible combinations of the covariate values in a logistic regression model is much larger than the sample size, such as when the logistic model contains at least one continuous covariate, many existing chi-square goodness-of-fit tests either are not applicable or have some serious drawbacks. In this paper two residual based normal goodness-of-fit test statistics are proposed: the Pearson chi-square and an unweighted sum of residual squares. Easy-to-calculate approximations to the mean and variance of either statistic are also given. Their performance, in terms of both size and power, was satisfactory in our simulation studies. For illustration we apply them to a real data set.  相似文献   

12.
In the literature, traders are often classified into informed and uninformed and the trades from informed traders have market impacts. We investigate these trades by first establishing a scheme to identify the influential trades from the ordinary trades under certain criteria. The differential properties between these two types of trades are examined via the four transaction states classified by the trade price, trade volume, quotes, and quoted depth. Marginal distribution of the four states and the transition probability between different states are shown to be distinct for informed trades and ordinary liquidity trades. Furthermore, four market reaction factors are introduced and logistic regression models of the influential trades are established based on these four factors. Empirical study on the high-frequency transaction data from the NYSE TAQ database show supportive evidence for high correct classification rates of the logistic regression models.  相似文献   

13.
Cardiopulmonary cerebral resuscitation (CPCR) is a procedure to restore spontaneous circulation in patients with cardiopulmonary arrest (CPA). While animals with CPA generally have a lower success rate of CPCR than people do, CPCR studies in veterinary patients have been limited. In this paper, we construct a model for predicting success or failure of CPCR, and identifying and evaluating factors that affect the success of CPCR in veterinary patients. Due to reparametrization using multiple dummy variables or close proximity in nature, many variables in the data form groups, and thus a desirable method should take this grouping feature into account in variable selection. To accomplish these goals, we propose an adaptive group bridge method for a logistic regression model. The performance of the proposed method is evaluated under different simulated setups and compared with several other regression methods. Using the logistic group bridge model, we analyze data from a CPCR study for veterinary patients and discuss their implications on the practice of veterinary medicine.  相似文献   

14.
Logistic regression is frequently used for classifying observations into two groups. Unfortunately there are often outlying observations in a data set and these might affect the estimated model and the associated classification error rate. In this paper, the authors study the effect of observations in the training sample on the error rate by deriving influence functions. They obtain a general expression for the influence function of the error rate, and they compute it for the maximum likelihood estimator as well as for several robust logistic discrimination procedures. Besides being of interest in their own right, the influence functions are also used to derive asymptotic classification efficiencies of different logistic discrimination rules. The authors also show how influential points can be detected by means of a diagnostic plot based on the values of the influence function  相似文献   

15.
16.
Since the seminal paper by Cook (1977) in which he introduced Cook's distance, the identification of influential observations has received a great deal of interest and extensive investigation in linear regression. It is well documented that most of the popular diagnostic measures that are based on single-case deletion can mislead the analysis in the presence of multiple influential observations because of the well-known masking and/or swamping phenomena. Atkinson (1981) proposed a modification of Cook's distance. In this paper we propose a further modification of the Cook's distance for the identification of a single influential observation. We then propose new measures for the identification of multiple influential observations, which are not affected by the masking and swamping problems. The efficiency of the new statistics is presented through several well-known data sets and a simulation study.  相似文献   

17.
Detection of outliers or influential observations is an important work in statistical modeling, especially for the correlated time series data. In this paper we propose a new procedure to detect patch of influential observations in the generalized autoregressive conditional heteroskedasticity (GARCH) model. Firstly we compare the performance of innovative perturbation scheme, additive perturbation scheme and data perturbation scheme in local influence analysis. We find that the innovative perturbation scheme give better result than other two schemes although this perturbation scheme may suffer from masking effects. Then we use the stepwise local influence method under innovative perturbation scheme to detect patch of influential observations and uncover the masking effects. The simulated studies show that the new technique can successfully detect a patch of influential observations or outliers under innovative perturbation scheme. The analysis based on simulation studies and two real data sets show that the stepwise local influence method under innovative perturbation scheme is efficient for detecting multiple influential observations and dealing with masking effects in the GARCH model.  相似文献   

18.
Leverage values are being used in regression diagnostics as measures of unusual observations in the X-space. Detection of high leverage observations or points is crucial due to their responsibility for masking outliers. In linear regression, high leverage points (HLP) are those that stand far apart from the center (mean) of the data and hence the most extreme points in the covariate space get the highest leverage. But Hosemer and Lemeshow [Applied logistic regression, Wiley, New York, 1980] pointed out that in logistic regression, the leverage measure contains a component which can make the leverage values of genuine HLP misleadingly very small and that creates problem in the correct identification of the cases. Attempts have been made to identify the HLP based on the median distances from the mean, but since they are designed for the identification of a single high leverage point they may not be very effective in the presence of multiple HLP due to their masking (false–negative) and swamping (false–positive) effects. In this paper we propose a new method for the identification of multiple HLP in logistic regression where the suspect cases are identified by a robust group deletion technique and they are confirmed using diagnostic techniques. The usefulness of the proposed method is then investigated through several well-known examples and a Monte Carlo simulation.  相似文献   

19.
Abstract. This paper focuses on marginal regression models for correlated binary responses when estimation of the association structure is of primary interest. A new estimating function approach based on orthogonalized residuals is proposed. A special case of the proposed procedure allows a new representation of the alternating logistic regressions method through marginal residuals. The connections between second‐order generalized estimating equations, alternating logistic regressions, pseudo‐likelihood and other methods are explored. Efficiency comparisons are presented, with emphasis on variable cluster size and on the role of higher‐order assumptions. The new method is illustrated with an analysis of data on impaired pulmonary function.  相似文献   

20.
In comparison to other experimental studies, multicollinearity appears frequently in mixture experiments, a special study area of response surface methodology, due to the constraints on the components composing the mixture. In the analysis of mixture experiments by using a special generalized linear model, logistic regression model, multicollinearity causes precision problems in the maximum-likelihood logistic regression estimate. Therefore, effects due to multicollinearity can be reduced to a certain extent by using alternative approaches. One of these approaches is to use biased estimators for the estimation of the coefficients. In this paper, we suggest the use of logistic ridge regression (RR) estimator in the cases where there is multicollinearity during the analysis of mixture experiments using logistic regression. Also, for the selection of the biasing parameter, we use fraction of design space plots for evaluating the effect of the logistic RR estimator with respect to the scaled mean squared error of prediction. The suggested graphical approaches are illustrated on the tumor incidence data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号