期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Influential observations in principal component analysis:a case study

P. Pack I. T. Jolliffe B. J. T. Morgan 《Journal of applied statistics》1988,15(1):39-52

A number of results have been derived recently concerning the influence of individual observations in a principal component analysis. Some of these results, particularly those based on the correlation matrix, are applied to data consisting of seven anatomical measurements on students. The data have a correlation structure which is fairly typical of many found in allometry. This case study shows that theoretical influence functions often provide good estimates of the actual changes observed when individual observations are deleted from a principal component analysis. Different observations may be influential for different aspects of the principal component analysis (coefficients, variances and scores of principal components); these differences, and the distinction between outlying and influential observations are discussed in the context of the case study. A number of other complications, such as switching and rotation of principal components when an observation is deleted, are also illustrated. 相似文献

2.

Influential observations in GARCH models

《Journal of Statistical Computation and Simulation》2012,82(11):1571-1589

This paper examines local influence assessment in generalized autoregressive conditional heteroscesdasticity models with Gaussian and Student-t errors, where influence is examined via the likelihood displacement. The analysis of local influence is discussed under three perturbation schemes: data perturbation, innovative model perturbation and additive model perturbation. For each case, expressions for slope and curvature diagnostics are derived. Monte Carlo experiments are presented to determine the threshold values for locating influential observations. The empirical study of daily returns of the New York Stock Exchange composite index shows that local influence analysis is a useful technique for detecting influential observations; most of the observations detected as influential are associated with historical shocks in the market. Finally, based on this empirical study and the analysis of simulated data, some advice is given on how to use the discussed methodology. 相似文献

3.

Influence measures in affine combination type regression

M. Revan Özkale 《Journal of applied statistics》2013,40(10):2219-2243

The detection of outliers and influential observations has received a great deal of attention in the statistical literature in the context of least-squares (LS) regression. However, the explanatory variables can be correlated with each other and alternatives to LS come out to address outliers/influential observations and multicollinearity, simultaneously. This paper proposes new influence measures based on the affine combination type regression for the detection of influential observations in the linear regression model when multicollinearity exists. Approximate influence measures are also proposed for the affine combination type regression. Since the affine combination type regression includes the ridge, the Liu and the shrunken regressions as special cases, influence measures under the ridge, the Liu and the shrunken regressions are also examined to see the possible effect that multicollinearity can have on the influence of an observation. The Longley data set is given illustrating the influence measures in affine combination type regression and also in ridge, Liu and shrunken regressions so that the performance of different biased regressions on detecting and assessing the influential observations is examined. 相似文献

4.

Procedures for the identification of multiple influential observations in linear regression

A.A.M. Nurunnabi Ali S. Hadi A.H.M.R. Imon 《Journal of applied statistics》2014,41(6):1315-1331

Since the seminal paper by Cook (1977) in which he introduced Cook's distance, the identification of influential observations has received a great deal of interest and extensive investigation in linear regression. It is well documented that most of the popular diagnostic measures that are based on single-case deletion can mislead the analysis in the presence of multiple influential observations because of the well-known masking and/or swamping phenomena. Atkinson (1981) proposed a modification of Cook's distance. In this paper we propose a further modification of the Cook's distance for the identification of a single influential observation. We then propose new measures for the identification of multiple influential observations, which are not affected by the masking and swamping problems. The efficiency of the new statistics is presented through several well-known data sets and a simulation study. 相似文献

5.

Influence of groups of observations on bayes factors

K. D. S. Young 《统计学通讯:理论与方法》2013,42(5):1405-1426

A diagnostic for finding groups of observations influential on Bayes factors is discussed, which extends ideas in Pettit & Young (1990). Ways of reducing the combinatorial explosion involved in detecting more than one influential observation are considered. The effect of masking is also examined. Finally new graphical displays to identify these observations will be explored. 相似文献

6.

Robust confirmatory factor analysis based on the forward search algorithm

Aleš Toman 《Statistical Papers》2014,55(1):233-252

A key concept of the forward search algorithm in confirmatory factor analysis is ordering of the data on the basis of observational residuals. These residuals are computed under the proposed model and measure the discrepancy between the observed and predicted response for each unit of the sample. Regression-type factor scores are used to estimate model predictions. Informative forward plots are created for indexing influential observations and to show the dynamics of the estimates throughout the search. The detailed influence of each observation on the model parameters and fit indices is analyzed and a robust model inference is achieved. Real and simulated data sets with known contamination schemes are used to demonstrate the performance of the forward search algorithm. 相似文献

7.

A poisson-gamma model for two-stage cluster sampling data

《Journal of Statistical Computation and Simulation》2012,82(2):161-172

We propose a model for count data from two-stage cluster sampling, where observations within each cluster are subjected simultaneously to internal influences and external factors at the cluster level. This model can be seen as a two-stage hierarchical model with local and global predictors. This parameter-driven model causes the counts within a cluster to share a common latent factor and to be correlated. Maximum likelihood (ml) estimation based on an EM algorithm for the model is discussed. Simulation study is carried out to assess the benefit of using ml estimates compared to a standard Poisson regression analysis that ignores the within cluster correlation. 相似文献

8.

Pena's statistic for the Liu regression

Muhammad Kashif Muhammad Amanullah 《Journal of Statistical Computation and Simulation》2018,88(13):2473-2488

In fitting regression model, one or more observations may have substantial effects on estimators. These unusual observations are precisely detected by a new diagnostic measure, Pena's statistic. In this article, we introduce a type of Pena's statistic for each point in Liu regression. Using the forecast change property, we simplify the Pena's statistic in a numerical sense. It is found that the simplified Pena's statistic behaves quite well as far as detection of influential observations is concerned. We express Pena's statistic in terms of the Liu leverages and residuals. The normality of this statistic is also discussed and it is demonstrated that it can identify a subset of high Liu leverage outliers. For numerical evaluation, simulated studies are given and a real data set has been analysed for illustration. 相似文献

9.

Residuals in the Extended Growth Curve Model

JEMILA SEID HAMID DIETRICH VON ROSEN 《Scandinavian Journal of Statistics》2006,33(1):121-138

Abstract. The Extended Growth Curve model is considered. It turns out that the estimated mean of the model is the projection of the observations on the space generated by the design matrices which turns out to be the sum of two tensor product spaces. The orthogonal complement of this space is decomposed into four orthogonal spaces and residuals are defined by projecting the observation matrix on the resulting components. The residuals are interpreted and some remarks are given as to why we should not use ordinary residuals, what kind of information our residuals give and how this information might be used to validate model assumptions and detect outliers and influential observations. It is shown that the residuals are symmetrically distributed around zero and are uncorrelated with each other. The covariance between the residuals and the estimated model as well as the dispersion matrices for the residuals are also given. 相似文献

10.

Bayesian change-point problem using Bayes factor with hierarchical prior distribution

Myoungjin Jung Seongho Song 《统计学通讯:理论与方法》2017,46(3):1352-1366

We consider the hierarchical Bayesian models of change-point problem in a sequence of random variables having either normal population or skew-normal population. Further, we consider the problem of detecting an influential point concerning change point using Bayes factors. Our proposed models are illustrated with the real data example, the annual flow volume data of Nile River at Aswan from 1871 to 1970. The result using our proposed models indicated the largest influential observation in the year 1888 among outliers. We have shown that it is useful to measure the influence of observations on Bayes factors. Here, we consider omitting single observation as well. 相似文献

11.

Influence diagnostics for censored regression models with autoregressive errors

下载免费PDF全文

Fernanda L. Schumacher Victor H. Lachos Filidor E. Vilca‐Labra Luis M. Castro 《Australian & New Zealand Journal of Statistics》2018,60(2):209-229

Observations collected over time are often autocorrelated rather than independent, and sometimes include observations below or above detection limits (i.e. censored values reported as less or more than a level of detection) and/or missing data. Practitioners commonly disregard censored data cases or replace these observations with some function of the limit of detection, which often results in biased estimates. Moreover, parameter estimation can be greatly affected by the presence of influential observations in the data. In this paper we derive local influence diagnostic measures for censored regression models with autoregressive errors of order p (hereafter, AR(p)‐CR models) on the basis of the Q‐function under three useful perturbation schemes. In order to account for censoring in a likelihood‐based estimation procedure for AR(p)‐CR models, we used a stochastic approximation version of the expectation‐maximisation algorithm. The accuracy of the local influence diagnostic measure in detecting influential observations is explored through the analysis of empirical studies. The proposed methods are illustrated using data, from a study of total phosphorus concentration, that contain left‐censored observations. These methods are implemented in the R package ARCensReg. 相似文献

12.

INFLUENCE DIAGNOSTICS FOR THE NORMAL LINEAR MODEL WITH CENSORED DATA

L.A. Weissfeld H. Schneider 《Australian & New Zealand Journal of Statistics》1990,32(1):11-20

Methods of detecting influential observations for the normal model for censored data are proposed. These methods include one-step deletion methods, deletion of observations and the empirical influence function. Emphasis is placed on assessing the impact that a single observation has on the estimation of coefficients of the model. Functions of the coefficients such as the median lifetime are also considered. Results are compared when applied to two sets of data. 相似文献

13.

A multiple-case deletion approach for detecting influential points in high-dimensional regression

Tao Wang Qun Li Qingpei Zang 《统计学通讯:模拟与计算》2013,42(7):2065-2082

ABSTRACT

In high-dimensional regression, the presence of influential observations may lead to inaccurate analysis results so that it is a prime and important issue to detect these unusual points before statistical regression analysis. Most of the traditional approaches are, however, based on single-case diagnostics, and they may fail due to the presence of multiple influential observations that suffer from masking effects. In this paper, an adaptive multiple-case deletion approach is proposed for detecting multiple influential observations in the presence of masking effects in high-dimensional regression. The procedure contains two stages. Firstly, we propose a multiple-case deletion technique, and obtain an approximate clean subset of the data that is presumably free of influential observations. To enhance efficiency, in the second stage, we refine the detection rule. Monte Carlo simulation studies and a real-life data analysis investigate the effective performance of the proposed procedure. 相似文献

14.

Bounds for how much influence an observation can have

Ingram Olkin Adi Raveh 《Statistical Methods and Applications》2009,18(1):1-11

That outliers or influential observations can affect the results in a regression is well-known. But it is not clear how much influence a specific observation can have on other statistics. In time series, especially in predictive situations, the effect of additional observations is of singular importance. We here examine bounds for the effect of an additional observation on the mean, variance, Mahalanobis distance, product moment correlation, and coefficients of linearity and monotonicity. 相似文献

15.

Influence Measures in Quantile Regression Models

Bruno R. Santos Silvia N. Elian 《统计学通讯:理论与方法》2013,42(9):1842-1853

In this article, we use the asymmetric Laplace distribution to define a new method to determine the influence of a certain observation in the fit of quantile regression models. Our measure is based on the likelihood displacement function and we propose two types of measures in order to determine influential observations in a set of conditional quantiles conjointly or in each conditional quantile of interest. We verify the validity of our average measure in a simulated data set as well in an illustrative example with data about air pollution. 相似文献

16.

Bayesian analysis of outlier problems using divergence measures

Fengchun Peng Dipak K. Dey 《Revue canadienne de statistique》1995,23(2):199-213

A Bayesian approach is presented for detecting influential observations using general divergence measures on the posterior distributions. A sampling-based approach using a Gibbs or Metropolis-within-Gibbs method is used to compute the posterior divergence measures. Four specific measures are proposed, which convey the effects of a single observation or covariate on the posterior. The technique is applied to a generalized linear model with binary response data, an overdispersed model and a nonlinear model. An asymptotic approximation using Laplace method to obtain the posterior divergence is also briefly discussed. 相似文献

17.

Testing for Heteroscedasticity and/or Autocorrelation in Longitudinal Mixed Effect Nonlinear Models with AR(1) Errors

Jin-Guan Lin Bo-Cheng Wei 《统计学通讯:理论与方法》2013,42(3):567-586

The effect of influental observation son the parameter estimates of ordinary least squares regression models has received considerable a t t e n t i o n fn the last decade. However, very little attention has been given to the problem of influential observation sinthea naysis of variace . The purpose of this paper is to show by way of examples that in fluential observations can alter the conclusions of tests of hypotheses in the analysis of variance . Regression diagno stics for identifying both extreme points and out liers can be used toreveal potential data and design problems. 相似文献

18.

Stepwise local influence in generalized autoregressive conditional heteroskedasticity models

Lei Shi Md. Mostafizur Rahman Wen Gan Jianhua Zhao 《Journal of applied statistics》2015,42(2):428-444

Detection of outliers or influential observations is an important work in statistical modeling, especially for the correlated time series data. In this paper we propose a new procedure to detect patch of influential observations in the generalized autoregressive conditional heteroskedasticity (GARCH) model. Firstly we compare the performance of innovative perturbation scheme, additive perturbation scheme and data perturbation scheme in local influence analysis. We find that the innovative perturbation scheme give better result than other two schemes although this perturbation scheme may suffer from masking effects. Then we use the stepwise local influence method under innovative perturbation scheme to detect patch of influential observations and uncover the masking effects. The simulated studies show that the new technique can successfully detect a patch of influential observations or outliers under innovative perturbation scheme. The analysis based on simulation studies and two real data sets show that the stepwise local influence method under innovative perturbation scheme is efficient for detecting multiple influential observations and dealing with masking effects in the GARCH model. 相似文献

19.

A robust variable screening method for high-dimensional data

Tao Wang Lin Zheng Haiyang Liu 《Journal of applied statistics》2017,44(10):1839-1855

In practice, the presence of influential observations may lead to misleading results in variable screening problems. We, therefore, propose a robust variable screening procedure for high-dimensional data analysis in this paper. Our method consists of two steps. The first step is to define a new high-dimensional influence measure and propose a novel influence diagnostic procedure to remove those unusual observations. The second step is to utilize the sure independence screening procedure based on distance correlation to select important variables in high-dimensional regression analysis. The new influence measure and diagnostic procedure that we developed are model free. To confirm the effectiveness of the proposed method, we conduct simulation studies and a real-life data analysis to illustrate the merits of the proposed approach over some competing methods. Both the simulation results and the real-life data analysis demonstrate that the proposed method can greatly control the adverse effect after detecting and removing those unusual observations, and performs better than the competing methods. 相似文献

20.

中国省际GDP强影响性特征及其形成机制研究

周建张敏《统计研究》2014,31(9):37-43

本文采用最新前沿的宏观经济数据诊断和空间面板模型理论和方法,对我国30个省（西藏除外）1999-2011年的省际GDP强影响性特征进行了实证分析,在此基础上,进一步进行B-N分解深入研究了其形成机制。主要结论表明：我国省际GDP不仅空间关联,而且存在着显著的强影响性特征,每一年样本信息对于样本期内全部信息的影响程度和大小存在着差异。同时,我国省际GDP强影响性特征具有极其显著的形成机制,随机性成分是造成强影响性的主要根源,周期成分也产生较为重要的影响。以上结论对于我国省际GDP强影响性特征判断具有十分重要的政策含义,进一步对于我国政府熨平经济波动、实现宏观经济的平稳较快增长的政策制订和完善等具有十分重要的现实借鉴意义。相似文献