期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Identifying multiple influential observations in linear regression

A. H. M. Rahmatullah Imon 《Journal of applied statistics》2005,32(9):929-946

The identification of influential observations has drawn a great deal of attention in regression diagnostics. Most of these identification techniques are based on single case deletion and among them DFFITS has become very popular with the statisticians. But this technique along with all other single case diagnostics may be ineffective in the presence of multiple influential observations. In this paper we develop a generalized version of DFFITS based on group deletion and then propose a new technique to identify multiple influential observations using this. The advantage of using the proposed method in the identification of multiple influential cases is then investigated through several well-referred data sets. 相似文献

2.

Leverages and Influential Observations in a Regression Model with Autocorrelated Errors

M. Revan Özkale Dr. Tuğba Söküt Açar 《统计学通讯:理论与方法》2013,42(11):2267-2290

This article deals with the general form of the hat matrix and the DFBETA measure to detect the influential observations and the leverages in the linear regression model with more than one regressor when the errors are from AR(1) and AR(2) processes. Previous studies dealing with the influential observations and the leverages in the constant mean model and regression through the origin model are obtained as special cases. To demonstrate the utility of the hat matrix and the DFBETA measure, two numerical examples based on the ice cream consumption data with AR(1) errors and the Fox-Hartnagel data with AR(2) errors are analyzed. The results show that the parameter of the autoregressive process affects the influential and leverage points. 相似文献

3.

Identification of multiple influential observations in logistic regression

A. A.M. Nurunnabi A. H.M. Rahmatullah Imon M. Nasser 《Journal of applied statistics》2010,37(10):1605-1624

The identification of influential observations in logistic regression has drawn a great deal of attention in recent years. Most of the available techniques like Cook's distance and difference of fits (DFFITS) are based on single-case deletion. But there is evidence that these techniques suffer from masking and swamping problems and consequently fail to detect multiple influential observations. In this paper, we have developed a new measure for the identification of multiple influential observations in logistic regression based on a generalized version of DFFITS. The advantage of the proposed method is then investigated through several well-referred data sets and a simulation study. 相似文献

4.

A multiple-case deletion approach for detecting influential points in high-dimensional regression

Tao Wang Qun Li Qingpei Zang 《统计学通讯:模拟与计算》2013,42(7):2065-2082

ABSTRACT

In high-dimensional regression, the presence of influential observations may lead to inaccurate analysis results so that it is a prime and important issue to detect these unusual points before statistical regression analysis. Most of the traditional approaches are, however, based on single-case diagnostics, and they may fail due to the presence of multiple influential observations that suffer from masking effects. In this paper, an adaptive multiple-case deletion approach is proposed for detecting multiple influential observations in the presence of masking effects in high-dimensional regression. The procedure contains two stages. Firstly, we propose a multiple-case deletion technique, and obtain an approximate clean subset of the data that is presumably free of influential observations. To enhance efficiency, in the second stage, we refine the detection rule. Monte Carlo simulation studies and a real-life data analysis investigate the effective performance of the proposed procedure. 相似文献

5.

Procedures for the identification of multiple influential observations in linear regression

A.A.M. Nurunnabi Ali S. Hadi A.H.M.R. Imon 《Journal of applied statistics》2014,41(6):1315-1331

Since the seminal paper by Cook (1977) in which he introduced Cook's distance, the identification of influential observations has received a great deal of interest and extensive investigation in linear regression. It is well documented that most of the popular diagnostic measures that are based on single-case deletion can mislead the analysis in the presence of multiple influential observations because of the well-known masking and/or swamping phenomena. Atkinson (1981) proposed a modification of Cook's distance. In this paper we propose a further modification of the Cook's distance for the identification of a single influential observation. We then propose new measures for the identification of multiple influential observations, which are not affected by the masking and swamping problems. The efficiency of the new statistics is presented through several well-known data sets and a simulation study. 相似文献

6.

INFLUENCE DIAGNOSTICS FOR THE NORMAL LINEAR MODEL WITH CENSORED DATA

L.A. Weissfeld H. Schneider 《Australian & New Zealand Journal of Statistics》1990,32(1):11-20

Methods of detecting influential observations for the normal model for censored data are proposed. These methods include one-step deletion methods, deletion of observations and the empirical influence function. Emphasis is placed on assessing the impact that a single observation has on the estimation of coefficients of the model. Functions of the coefficients such as the median lifetime are also considered. Results are compared when applied to two sets of data. 相似文献

7.

A Comparison of Permutation Hotelling's T 2 Test and Log-Ratio Test for Analyzing Compositional Data

Deo Kumar Srivastava James M. Boyett Carl W. Jackson Xin Tong Shesh N. Rai 《统计学通讯:理论与方法》2013,42(2):415-431

An adding-back model is constructed for studying multiple outliers and influential observations. A logarithmic functional form for some influence measures having a better justification for plotting purposes is suggested. Two graphical methods with contours of constant measure values are proposed. They provide valuable information about the interrelationship of multiple influential observations and influence measures. 相似文献

8.

Detecting influential observations in Liu and modified Liu estimators

Hasan Ertas Selahattin Kaciranlar 《Journal of applied statistics》2013,40(8):1735-1745

In regression, detecting anomalous observations is a significant step for model-building process. Various influence measures based on different motivational arguments are designed to measure the influence of observations through different aspects of various regression models. The presence of influential observations in the data is complicated by the existence of multicollinearity. The purpose of this paper is to assess the influence of observations in the Liu [9] and modified Liu [15] estimators by using the method of approximate case deletion formulas suggested by Walker and Birch [14]. A numerical example using a real data set used by Longley [10] and a Monte Carlo simulation are given to illustrate the theoretical results. 相似文献

9.

Local Influence Analysis in AB–BA Crossover Designs

Chengcheng Hao Dietrich von Rosen Tatjana von Rosen 《Scandinavian Journal of Statistics》2014,41(4):1153-1166

The aim of this article is to develop methodology for detecting influential observations in crossover models with random individual effects. Various case‐weighted perturbations are performed. We obtain the influence of the perturbations on each parameter estimator and on their dispersion matrices. The obtained results exhibit the possibility to obtain closed‐form expressions of the influence using the residuals in mixed linear models. Some graphical tools are also presented. 相似文献

10.

Multiple cases deletion measures in linear measurement error models

Karim Zare 《统计学通讯:理论与方法》2019,48(4):954-963

In this paper, we define a multiple cases deletion model (MCDM) in linear measurement error models (LMEMs). Then, by using the corrected score method of Nakamura (1990), the estimation of parameters is obtained. Furthermore, Based on MCDM, we provide computationally inexpensive deletion diagnostic tools for LMEMs. An example illustrates that our method is useful for diagnosing influential subsets of observations. 相似文献

11.

Deletion residuals in the detection of heterogeneity of variances in linear regression

A. H.M. Rahmatullah Imon 《Journal of applied statistics》2009,36(3):347-358

The heterogeneity of error variance often causes a huge interpretive problem in linear regression analysis. Before taking any remedial measures we first need to detect this problem. A large number of diagnostic plots are now available in the literature for detecting heteroscedasticity of error variances. Among them the ‘residuals’ and ‘fits’ (R–F) plot is very popular and commonly used. In the R–F plot residuals are plotted against the fitted responses, where both these components are obtained using the ordinary least squares (OLS) method. It is now evident that the OLS fits and residuals suffer a huge setback in the presence of unusual observations and hence the R–F plot may not exhibit the real scenario. The deletion residuals based on a data set free from all unusual cases should estimate the true errors in a better way than the OLS residuals. In this paper we propose ‘deletion residuals’ and the ‘deletion fits’ (DR–DF) plot for the detection of the heterogeneity of error variances in a linear regression model to get a more convincing and reliable graphical display. Examples show that this plot locates unusual observations more clearly than the R–F plot. The advantage of using deletion residuals in the detection of heteroscedasticity of error variance is investigated through Monte Carlo simulations under a variety of situations. 相似文献

12.

A GRAPHICAL TECHNIQUE FOR DETECTING INFLUENTIAL CASES IN REGRESSION ANALYSIS

《统计学通讯:理论与方法》2013,42(3):463-483

This paper presents a graphical technique for detecting influential cases in regression analysis. The idea is to decompose a diagnostic problem involving higher order dimensional regression problems, into a series of two-dimensional diagnostic sub-problems, such that the diagnoses of influential cases is undertaken by visually inspecting two-dimensional diagnostic plots of these sub-problems. An algorithm for the graphical procedure is proposed to reduce the computational effort. Practical examples are used to illustrate this graphical technique. 相似文献

13.

Regression Diagnostic under Model Misspecification

Li-Chu Chien Tsung-Shan Tsou 《Journal of applied statistics》2007,34(5):563-575

We propose two novel diagnostic measures for the detection of influential observations for regression parameters in linear regression. Traditional diagnostic statistics focus on the effect of deletion of data points either on parameter estimates, or on predicted values. A data point is regarded as influential by the new methods if its inclusion determines a significantly different likelihood function for the parameter of interest. The concerned likelihood function is asymptotically valid for practically all underlying distributions whose second moments exist. 相似文献

14.

Deletion diagnostics for generalized linear models using the adjusted Poisson likelihood function

Li-Chu Chien Tsung-Shan Tsou 《Journal of statistical planning and inference》2011,141(6):2044-2054

In this article, we propose two novel diagnostic measures for the deletion of influential observations for regression parameters in the setting of generalized linear models. The proposed diagnostic methods are capable for detecting the influential observations under model misspecification, as long as the true underlying distributions have finite second moments.More specifically, it is demonstrated that the Poisson likelihood function can be properly adjusted to become asymptotically valid for practically all underlying discrete distributions. The adjusted Poisson regression model that achieves the robustness property is presented. Simulation studies and an illustration are performed to demonstrate the efficacy of the two novel diagnostic procedures. 相似文献

15.

Generalized Weibull Linear Models

Andrea A. Prudente 《统计学通讯:理论与方法》2013,42(20):3739-3755

For the first time, a new class of generalized Weibull linear models is introduced to be competitive to the well-known generalized (gamma and inverse Gaussian) linear models which are adequate for the analysis of positive continuous data. The proposed models have a constant coefficient of variation for all observations similar to the gamma models and may be suitable for a wide range of practical applications in various fields such as biology, medicine, engineering, and economics, among others. We derive a joint iterative algorithm for estimating the mean and dispersion parameters. We obtain closed form expressions in matrix notation for the second-order biases of the maximum likelihood estimates of the model parameters and define bias corrected estimates. The corrected estimates are easily obtained as vectors of regression coefficients in suitable weighted linear regressions. The practical use of the new class of models is illustrated in one application to a lung cancer data set. 相似文献

16.

Bayesian parameter estimation via variational methods

Jaakkola Tommi S. Jordan Michael I. 《Statistics and Computing》2000,10(1):25-37

We consider a logistic regression model with a Gaussian prior distribution over the parameters. We show that an accurate variational transformation can be used to obtain a closed form approximation to the posterior distribution of the parameters thereby yielding an approximate posterior predictive model. This approach is readily extended to binary graphical model with complete observations. For graphical models with incomplete observations we utilize an additional variational transformation and again obtain a closed form approximation to the posterior. Finally, we show that the dual of the regression problem gives a latent variable density model, the variational formulation of which leads to exactly solvable EM updates. 相似文献

17.

Local linear regression with adaptive orthogonal fitting for the wind power application

Pierre Pinson Henrik Aa. Nielsen Henrik Madsen Torben S. Nielsen 《Statistics and Computing》2008,18(1):59-71

Short-term forecasting of wind generation requires a model of the function for the conversion of meteorological variables (mainly wind speed) to power production. Such a power curve is nonlinear and bounded, in addition to being nonstationary. Local linear regression is an appealing nonparametric approach for power curve estimation, for which the model coefficients can be tracked with recursive Least Squares (LS) methods. This may lead to an inaccurate estimate of the true power curve, owing to the assumption that a noise component is present on the response variable axis only. Therefore, this assumption is relaxed here, by describing a local linear regression with orthogonal fit. Local linear coefficients are defined as those which minimize a weighted Total Least Squares (TLS) criterion. An adaptive estimation method is introduced in order to accommodate nonstationarity. This has the additional benefit of lowering the computational costs of updating local coefficients every time new observations become available. The estimation method is based on tracking the left-most eigenvector of the augmented covariance matrix. A robustification of the estimation method is also proposed. Simulations on semi-artificial datasets (for which the true power curve is available) underline the properties of the proposed regression and related estimation methods. An important result is the significantly higher ability of local polynomial regression with orthogonal fit to accurately approximate the target regression, even though it may hardly be visible when calculating error criteria against corrupted data. 相似文献

18.

Influence of groups of observations on bayes factors

K. D. S. Young 《统计学通讯:理论与方法》2013,42(5):1405-1426

A diagnostic for finding groups of observations influential on Bayes factors is discussed, which extends ideas in Pettit & Young (1990). Ways of reducing the combinatorial explosion involved in detecting more than one influential observation are considered. The effect of masking is also examined. Finally new graphical displays to identify these observations will be explored. 相似文献

19.

Identification and classification of multiple outliers,high leverage points and influential observations in linear regression

A.A.M. Nurunnabi M. Nasser A.H.M.R. Imon 《Journal of applied statistics》2016,43(3):509-525

Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects. 相似文献

20.

Sensitivity measures of influence on the loading matrix in exploratory factor analysis

Eduardo Castaño-Tostado Yutaka Tanaka 《统计学通讯:理论与方法》2013,42(4):1329-1343

For the detection of influential observations on the loading matrix of the factor analysis model, we propose to use the infinitesimal version of two matrix coefficients, including Escoufier (1973)'s also discussed the application in factor analysis of some sensitivity measures used for similar purposes in principal component analysis. 相似文献