期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A comparative study on detection of influential observations in linear regression

A. Hossain D. N. Naik 《Statistical Papers》1991,32(1):55-69

A large number of statistics are used in the literature to detect outliers and influential observations in the linear regression model. In this paper comparison studies have been made for determining a statistic which performs better than the other. This includes: (i) a detailed simulation study, and (ii) analyses of several data sets studied by different authors. Different choices of the design matrix of regression model are considered. Design A studies the performance of the various statistics for detecting the scale shift type outliers, and designs B and C provide information on the performance of the statistics for identifying the influential observations. We have used cutoff points using the exact distributions and Bonferroni's inequality for each statistic. The results show that the studentized residual which is used for detection of mean shift outliers is appropriate for detection of scale shift outliers also, and the Welsch's statistic and the Cook's distance are appropriate for detection of influential observations. 相似文献

2.

Procedures for the identification of multiple influential observations in linear regression

A.A.M. Nurunnabi Ali S. Hadi A.H.M.R. Imon 《Journal of applied statistics》2014,41(6):1315-1331

Since the seminal paper by Cook (1977) in which he introduced Cook's distance, the identification of influential observations has received a great deal of interest and extensive investigation in linear regression. It is well documented that most of the popular diagnostic measures that are based on single-case deletion can mislead the analysis in the presence of multiple influential observations because of the well-known masking and/or swamping phenomena. Atkinson (1981) proposed a modification of Cook's distance. In this paper we propose a further modification of the Cook's distance for the identification of a single influential observation. We then propose new measures for the identification of multiple influential observations, which are not affected by the masking and swamping problems. The efficiency of the new statistics is presented through several well-known data sets and a simulation study. 相似文献

3.

Mean shift and influence measures in linear measurement error models with stochastic linear restrictions

F. Ghapani B. Babadi 《统计学通讯:模拟与计算》2017,46(6):4499-4512

We present influence diagnostics for linear measurement error models with stochastic linear restrictions using the corrected likelihood of Nakamura in 1990. The case deletion and mean shift outlier models are developed to identify outlying and influential observations. We derive a corrected score test statistic for outlier detection based on mean shift outlier models. The analogs of Cook's distance and likelihood distance are proposed to determine influential observations based on case deletion models. A parametric bootstrap procedure is used to obtain empirical distributions of the test statistics and a simulation study has been used to evaluate the performance of the proposed estimators based on the mean squares error criterion and the score test statistic. Finally, a numerical example is given to illustrate the theoretical results. 相似文献

4.

Bayesian estimation and influence diagnostics of generalized partially linear mixed-effects models for longitudinal data

Xing-De Duan 《Statistics》2016,50(3):525-539

This paper develops a Bayesian approach to obtain the joint estimates of unknown parameters, nonparametric functions and random effects in generalized partially linear mixed models (GPLMMs), and presents three case deletion influence measures to identify influential observations based on the φ-divergence, Cook's posterior mean distance and Cook's posterior mode distance of parameters. Fisher's iterative scoring algorithm is developed to evaluate the posterior modes of parameters in GPLMMs. The first-order approximation to Cook's posterior mode distance is presented. The computationally feasible formulae for the φ-divergence diagnostic and Cook's posterior mean distance are given. Several simulation studies and an example are presented to illustrate our proposed methodologies. 相似文献

5.

Influential observations in view of design and inference

Subir Ghosh 《统计学通讯:理论与方法》2013,42(14):1675-1683

In this paper we consider the measures for detecting the influential observations w.r.t. one or several parameters of interest at the design stage. We also consider the Cook's measure for detecting the influential observations at the inference stage. We study the interrelationship between two kinds of measures. 相似文献

6.

Jackknife-After-Bootstrap as Logistic Regression Diagnostic Tool

Ufuk Beyaztas 《统计学通讯:模拟与计算》2013,42(9):2047-2060

In this study, we propose using Jackknife-after-Bootstrap (JaB) method to detect influential observations in binary logistic regression model. Performance of the proposed method has been compared with the traditional method for standardized Pearson residuals, Cook's distance, change in the Pearson chi-square and change in the deviance statistics by both real world examples and simulation studies. The results reveal that under the various scenarios considered in this article, JaB performs better than the traditional method and is more robust to masking effect especially for Cook's distance. 相似文献

7.

Identification of multiple influential observations in logistic regression

A. A.M. Nurunnabi A. H.M. Rahmatullah Imon M. Nasser 《Journal of applied statistics》2010,37(10):1605-1624

The identification of influential observations in logistic regression has drawn a great deal of attention in recent years. Most of the available techniques like Cook's distance and difference of fits (DFFITS) are based on single-case deletion. But there is evidence that these techniques suffer from masking and swamping problems and consequently fail to detect multiple influential observations. In this paper, we have developed a new measure for the identification of multiple influential observations in logistic regression based on a generalized version of DFFITS. The advantage of the proposed method is then investigated through several well-referred data sets and a simulation study. 相似文献

8.

On the asymptotic distribution of Cook's distance in logistic regression models

Nirian Martín Leandro Pardo 《Journal of applied statistics》2009,36(10):1119-1146

It sometimes occurs that one or more components of the data exert a disproportionate influence on the model estimation. We need a reliable tool for identifying such troublesome cases in order to decide either eliminate from the sample, when the data collect was badly realized, or otherwise take care on the use of the model because the results could be affected by such components. Since a measure for detecting influential cases in linear regression setting was proposed by Cook [Detection of influential observations in linear regression, Technometrics 19 (1977), pp. 15–18.], apart from the same measure for other models, several new measures have been suggested as single-case diagnostics. For most of them some cutoff values have been recommended (see [D.A. Belsley, E. Kuh, and R.E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, 2nd ed., John Wiley & Sons, New York, Chichester, Brisban, (2004).], for instance), however the lack of a quantile type cutoff for Cook's statistics has induced the analyst to deal only with index plots as worthy diagnostic tools. Focussed on logistic regression, the aim of this paper is to provide the asymptotic distribution of Cook's distance in order to look for a meaningful cutoff point for detecting influential and leverage observations. 相似文献

9.

Cook's distance in linear longitudinal models

Mousumi Banerjee 《统计学通讯:理论与方法》2013,42(12):2973-2983

Cook's distance (1977) has become the standard influence diagnostic tool for analyzing cross–sectional regression studies. This paper introduces an analogue of Cook's distance in fixed effects models for longitudinal data. We demonstrate that this statistic is dominated by the effects of nuisance parameters, and hence its effectiveness as an influence measure in the longitudinal data setting is limited. 相似文献

10.

Residual analysis and outliers in loglinear models based on phi-divergence statistics

A.K. Gupta T. NguyenL. Pardo 《Journal of statistical planning and inference》2007

In this paper we consider new families of residuals and influential measures, under the assumption of multinomial sampling, for loglinear models. These new families are based on φ

φ

-divergence test statistic. The asymptotic normality of the standardized residuals is obtained as well as the relation of the new family of influential measures with the appropriate Cook's distance in this context. The expression of the new family of residuals is obtained in two important problems: independence and symmetry in two-dimensional contingence tables. A numerical example illustrates the results obtained. 相似文献

11.

On detecting influential data and selecting regression variables

《Journal of statistical planning and inference》1996,53(3):421-435

The analysis of residuals may reveal various functional forms suitable for the regression model. In this paper, we investigate some selection criteria for selecting important regression variables. In doing so, we use statistical selection and ranking procedures. Thus, we derive an appropriate criterion to measure the influence and bias for the reduced models. We show that the reduced models are based on some noncentrality parameters which provide a measure of goodness of fit for the fitted models. In this paper, we also discuss the relationships of influence diagnostics and the statistic proposed earlier by Gupta and Huang (J. Statist. Plann. Inference 20 (1988) 155–167). We introduce a new measure for detecting influential data as an alternative to Cook's measure. 相似文献

12.

Pena's statistic for the Liu regression

Muhammad Kashif Muhammad Amanullah 《Journal of Statistical Computation and Simulation》2018,88(13):2473-2488

In fitting regression model, one or more observations may have substantial effects on estimators. These unusual observations are precisely detected by a new diagnostic measure, Pena's statistic. In this article, we introduce a type of Pena's statistic for each point in Liu regression. Using the forecast change property, we simplify the Pena's statistic in a numerical sense. It is found that the simplified Pena's statistic behaves quite well as far as detection of influential observations is concerned. We express Pena's statistic in terms of the Liu leverages and residuals. The normality of this statistic is also discussed and it is demonstrated that it can identify a subset of high Liu leverage outliers. For numerical evaluation, simulated studies are given and a real data set has been analysed for illustration. 相似文献

13.

Graphical and Numerical Methods for Detecting Influential Observations in Complex Bingham Data

Getulio Jose amorim Amaral Olga Patricia reyes Floréz Francisco José A Cysneiros 《统计学通讯:模拟与计算》2013,42(8):1801-1814

We describe methods to detect influential observations in a sample of pre-shapes when the underlying distribution is assumed to be complex Bingham. One of these methods is based on Cook's distance, which is derived from the likelihood of the complex Bingham distribution. Other method is related to the tangent space, which is based on the local influence for the multivariate normal distribution. A method to detect outliers is also explained. The application of the methods is illustrated in both a real dataset and a simulated sample. 相似文献

14.

The Relationship Between the T2 Statistic and the Influence Function

Robert L. Mason Youn-Min Chou John C. Young 《统计学通讯:理论与方法》2014,43(13):2844-2857

Hotelling's T² statistic has many applications in multivariate analysis. In particular, it can be used to measure the influence that a particular observation vector has on parameter estimation. For example, in the bivariate case, there exists a direct relationship between the ellipse generated using a T² statistic for individual observations and the hyperbolae generated using Hampel's influence function for the corresponding correlation coefficient. In this paper, we jointly use the components of an orthogonal decomposition of the T² statistic and some influence functions to identify outliers or influential observations. Since the conditional components in the T² statistic are related to the possible changes in the correlation between a variable and a group of other variables, we consider the theoretical influence functions of the correlations and multiple correlation coefficients. Finite-sample versions of these influence functions are used to find the estimated influence function values. 相似文献

15.

Local influence analysis for regression models with scale mixtures of skew-normal distributions

C. B. Zeller F. E. Vilca-Labra 《Journal of applied statistics》2011,38(2):343-368

The robust estimation and the local influence analysis for linear regression models with scale mixtures of multivariate skew-normal distributions have been developed in this article. The main virtue of considering the linear regression model under the class of scale mixtures of skew-normal distributions is that they have a nice hierarchical representation which allows an easy implementation of inference. Inspired by the expectation maximization algorithm, we have developed a local influence analysis based on the conditional expectation of the complete-data log-likelihood function, which is a measurement invariant under reparametrizations. This is because the observed data log-likelihood function associated with the proposed model is somewhat complex and with Cook's well-known approach it can be very difficult to obtain measures of the local influence. Some useful perturbation schemes are discussed. In order to examine the robust aspect of this flexible class against outlying and influential observations, some simulation studies have also been presented. Finally, a real data set has been analyzed, illustrating the usefulness of the proposed methodology. 相似文献

16.

The distribution of cook's d statistic

Keith E. Muller Mario chen Mok 《统计学通讯:理论与方法》2013,42(3):525-546

Cook (1977) proposed a diagnostic to quantify the impact of deleting an observation on the estimated regression coefficients of a General Linear Univariate Model (GLUM). Simulations of models with Gaussian response and predictors demonstrate that his suggestion of comparing the diagnostic to the median of the F for overall regression captures an erratically varying proportion of the values.

We describe the exact distribution of Cook's statistic for a GLUM with Gaussian predictors and response. We also present computational forms, simple approximations, and asymptotic results. A simulation supports the accuracy of the results. The methods allow accurate evaluation of a single value or the maximum value from a regression analysis. The approximations work well for a single value, but less well for the maximum. In contrast, the cut-point suggested by Cook provides widely varying tail probabilities. As with all diagnostics, the data analyst must use scientific judgment in deciding how to treat highlighted observations. 相似文献

17.

Diagnostic tools in generalized Weibull linear regression models

Luis Hernando Vanegas Gauss M. Cordeiro 《Journal of Statistical Computation and Simulation》2013,83(12):2315-2338

We propose some statistical tools for diagnosing the class of generalized Weibull linear regression models [A.A. Prudente and G.M. Cordeiro, Generalized Weibull linear models, Comm. Statist. Theory Methods 39 (2010), pp. 3739–3755]. This class of models is an alternative means of analysing positive, continuous and skewed data and, due to its statistical properties, is very competitive with gamma regression models. First, we show that the Weibull model induces ma-ximum likelihood estimators asymptotically more efficient than the gamma model. Standardized residuals are defined, and their statistical properties are examined empirically. Some measures are derived based on the case-deletion model, including the generalized Cook's distance and measures for identifying influential observations on partial F-tests. The results of a simulation study conducted to assess behaviour of the global influence approach are also presented. Further, we perform a local influence analysis under the case-weights, response and explanatory variables perturbation schemes. The Weibull, gamma and other Weibull-type regression models are fitted into three data sets to illustrate the proposed diagnostic tools. Statistical analyses indicate that the Weibull model fitted into these data yields better fits than other common alternative models. 相似文献

18.

The Prediction Sum of Squares as a General Measure for Regression Diagnostics

Nguyen T. Quan 《商业与经济统计学杂志》2013,31(4):501-504

Statistics that usually accompany the regression model do not provide insight into the quality of the data or the potential influence of the individual observations on the estimates. In this study, the Q² statistic is used as a criterion for detecting influential observations or outliers. The statistic is derived from the jackknifed residuals, the squared sum of which is generally known as the prediction sum of squares or PRESS. This article compares R ² with Q² and suggests that the latter be used as part of the data-quality check. It is shown, for two separate data sets obtained from regional cost of living and U.S. food industry studies, that in the presence of outliers the Q² statistic can be negative, because it is sensitive to the choice of regressors and the inclusion of influential observations. Once the outliers are dropped from the sample, the discrepancy between Q² and R ² values is negligible. 相似文献

19.

A New STATISTIC FOR DETECTING INFLUENTIAL OBSERVATIONS IN A SCHEFFE' TYPE CALIBRATION CURVE

Clifford H. Spiegelman 《Australian & New Zealand Journal of Statistics》1984,26(3):290-297

A statistic for identifying influential observations in calibration is given. The statistic is easy to interpret, and provides a useful measure of influence for Scheffé type calibration curves. 相似文献

20.

Influence on tests with focus on linear models

Christian Ritz Ib.M. Skovgaard 《Journal of statistical planning and inference》2007

To assess the influence of single observations on the parameter estimates, case-deletion diagnostics are commonly used in linear regression models; one example is Cook's distance. For nested parametric models we consider a deletion diagnostic for evaluating the influence of a single observation on the likelihood ratio (LR) test. In order to have a common scale as reference, the asymptotic distribution of the diagnostic is derived and the values of the diagnostic are converted to percentiles. We focus on linear models and general linear models, and in these cases explicit results are derived. The performance of the diagnostic is explored in two small bench mark examples from linear regression and in a larger linear mixed model example. 相似文献