期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A multiple-case deletion approach for detecting influential points in high-dimensional regression

Tao Wang Qun Li Qingpei Zang 《统计学通讯:模拟与计算》2013,42(7):2065-2082

ABSTRACT

In high-dimensional regression, the presence of influential observations may lead to inaccurate analysis results so that it is a prime and important issue to detect these unusual points before statistical regression analysis. Most of the traditional approaches are, however, based on single-case diagnostics, and they may fail due to the presence of multiple influential observations that suffer from masking effects. In this paper, an adaptive multiple-case deletion approach is proposed for detecting multiple influential observations in the presence of masking effects in high-dimensional regression. The procedure contains two stages. Firstly, we propose a multiple-case deletion technique, and obtain an approximate clean subset of the data that is presumably free of influential observations. To enhance efficiency, in the second stage, we refine the detection rule. Monte Carlo simulation studies and a real-life data analysis investigate the effective performance of the proposed procedure. 相似文献

2.

A diagnostic of influential cases based on the information complexity criteria in generalized linear mixed models

Junfeng Shang 《统计学通讯:理论与方法》2013,42(13):3751-3760

ABSTRACT

Modeling diagnostics assess models by means of a variety of criteria. Each criterion typically performs its evaluation upon a specific inferential objective. For instance, the well-known DFBETAS in linear regression models are a modeling diagnostic which is applied to discover the influential cases in fitting a model. To facilitate the evaluation of generalized linear mixed models (GLMM), we develop a diagnostic for detecting influential cases based on the information complexity (ICOMP) criteria for detecting influential cases which substantially affect the model selection criterion ICOMP. In a given model, the diagnostic compares the ICOMP criterion between the full data set and a case-deleted data set. The computational formula of the ICOMP criterion is evaluated using the Fisher information matrix. A simulation study is accomplished and a real data set of cancer cells is analyzed using the logistic linear mixed model for illustrating the effectiveness of the proposed diagnostic in detecting the influential cases. 相似文献

3.

Influence diagnostics for stratified ordinal contingency tables

《Journal of Statistical Computation and Simulation》2012,82(5):405-415

Influence diagnostics are investigated in this study. In particular, an approach based on the generalized linear mixed model setting is presented for formulating ordered categorical counts in stratified contingency tables. Deletion diagnostics and their first-order approximations are developed for assessing the stratum-specific influence on parameter estimates in the models. To illustrate the proposed model diagnostic technique, the method is applied to analyze two sets of data: a clinical trial and a survey study. The two examples demonstrate that the presence of influential strata may substantially change the results in ordinal contingency table analysis. 相似文献

4.

Influential observations in GARCH models

《Journal of Statistical Computation and Simulation》2012,82(11):1571-1589

This paper examines local influence assessment in generalized autoregressive conditional heteroscesdasticity models with Gaussian and Student-t errors, where influence is examined via the likelihood displacement. The analysis of local influence is discussed under three perturbation schemes: data perturbation, innovative model perturbation and additive model perturbation. For each case, expressions for slope and curvature diagnostics are derived. Monte Carlo experiments are presented to determine the threshold values for locating influential observations. The empirical study of daily returns of the New York Stock Exchange composite index shows that local influence analysis is a useful technique for detecting influential observations; most of the observations detected as influential are associated with historical shocks in the market. Finally, based on this empirical study and the analysis of simulated data, some advice is given on how to use the discussed methodology. 相似文献

5.

Robust multivariate diagnostics for PLSR and application on high dimensional spectrally overlapped drug systems

Aylin Alin Claudio Agostinelli Georgi Gergov Plamen Katsarov Yahya Al-Degs 《Journal of Statistical Computation and Simulation》2019,89(6):966-984

ABSTRACT

Statistical methods are effectively used in the evaluation of pharmaceutical formulations instead of laborious liquid chromatography. However, signal overlapping, nonlinearity, multicollinearity and presence of outliers deteriorate the performance of statistical methods. The Partial Least Squares Regression (PLSR) is a very popular method in the quantification of high dimensional spectrally overlapped drug formulations. The SIMPLS is the mostly used PLSR algorithm, but it is highly sensitive to outliers that also effect the diagnostics. In this paper, we propose new robust multivariate diagnostics to identify outliers, influential observations and points causing non-normality for a PLSR model. We study performances of the proposed diagnostics on two everyday use highly overlapping drug systems: Paracetamol–Caffeine and Doxylamine Succinate–Pyridoxine Hydrochloride. 相似文献

6.

Q plots,a graphical aid for regression analysis

Derek O. Chalton Cas G. Troskie 《统计学通讯:理论与方法》2013,42(3):625-636

Plots are presented which are based on the singular value decomposition of the augmented data matrix in regression. In general, these plots assist in identifying discrepant observations, and in conjunction with associated diagnostics they are useful for identifying influential observations. 相似文献

7.

Identifying multiple influential observations in linear regression

A. H. M. Rahmatullah Imon 《Journal of applied statistics》2005,32(9):929-946

The identification of influential observations has drawn a great deal of attention in regression diagnostics. Most of these identification techniques are based on single case deletion and among them DFFITS has become very popular with the statisticians. But this technique along with all other single case diagnostics may be ineffective in the presence of multiple influential observations. In this paper we develop a generalized version of DFFITS based on group deletion and then propose a new technique to identify multiple influential observations using this. The advantage of using the proposed method in the identification of multiple influential cases is then investigated through several well-referred data sets. 相似文献

8.

Using Liu estimator for detection of influential observations in linear measurement error models

Fatemeh Ghapani 《统计学通讯:理论与方法》2013,42(19):4748-4763

Abstract

In this paper, we introduce Liu estimator for the vector of parameters in linear measurement error models and discuss its asymptotic properties. Based on the Liu estimator, diagnostic measures are developed to identify influential observations. Additionally, the analogs of Cook’s distance and likelihood distance are proposed to determine influential observations using case deletion approach. A parametric bootstrap procedure is used to obtain empirical distributions of the test statistics. Finally, the performance of the influence measures have been illustrated through simulation study and analyzing a real data set. 相似文献

9.

Influence measure for the L1 regression

Silvia N. Elian Carmen D.S. André Subhash C. Narula 《统计学通讯:理论与方法》2013,42(4):837-849

Because outliers and leverage observations unduly affect the least squares regression, the identification of influential observations is considered an important and integrai part of the analysis. However, very few techniques have been developed for the residual analysis and diagnostics for the minimum sum of absolute errors, L₁ regression. Although the L₁ regression is more resistant to the outliers than the least squares regression, it appears that outliers (leverage) in the predictor variables may affect it. In this paper, our objective is to develop an influence measure for the L₁ regression based on the likelihood displacement function. We illustrate the proposed influence measure with examples. 相似文献

10.

Influence diagnostics in gamma ridge regression model

Muhammad Amin Muhammad Amanullah Muhammad Aslam Muhammad Qasim 《Journal of Statistical Computation and Simulation》2019,89(3):536-556

In this article, we proposed some influence diagnostics for the gamma regression model (GRM) and the gamma ridge regression model (GRRM). We assess the impact of influential observations on the GRM and GRRM estimates by extending the work of Pregibon [Logistic regression diagnostics. Ann Stat. 1981;9:705–724] and Walker and Birch [Influence measures in ridge regression. Technometrics. 1988;30:221–227]. Comparison of both models is made and demonstrated with the help of a simulation study and a real data set. We report some momentous results in detecting the influential observations and their effects on the GRM and GRRM estimates. 相似文献

11.

Two graphical displays for the detection of potentially influential subsets in regression

Ali S. Hadi 《Journal of applied statistics》1990,17(3):313-327

In the context of the general linear model Y=Xβ+ε, the matrix P_z =Z(Z^TZ)^?1 Z^T , where Z=(X: Y), plays an important role in determining least squares results. In this article we propose two graphical displays for the off-diagonal as well as the diagonal elements of P_Z . The two graphs are based on simple ideas and are useful in the detection of potentially influential subsets of observations in regression. Since P_Z is invariant with respect to permutations of the columns of Z, an added advantage of these graphs is that they can be used to detect outliers in multivariate data where the rows of Z are usually regarded as a random sample from a multivariate population. We also suggest two calibration points, one for the diagonal elements of P_Z and the other for the off-diagonal elements. The advantage of these calibration points is that they take into consideration the variability of the off-diagonal as well as the diagonal elements of P_Z . They also do not suffer from masking. 相似文献

12.

Influence diagnostics in a vector autoregressive model

Yonghui Liu Guocheng Ji 《Journal of Statistical Computation and Simulation》2015,85(13):2632-2655

In this paper, we use a likelihood approach and the local influence method introduced by Cook [Assessment of local influence (with discussion). J Roy Statist Soc Ser B. 1986;48:133–149] to study a vector autoregressive (VAR) model. We present the maximum likelihood estimators and the information matrix. We establish the normal curvature and slope diagnostics for the VAR model under several perturbation schemes and use the Monte Carlo method to obtain benchmark values for determining the influence of directional diagnostics and possible influential observations. An empirical study using the VAR model to fit real data of monthly returns of IBM and S&P500 index illustrates the effectiveness of our proposed diagnostics. 相似文献

13.

中国省际GDP强影响性特征及其形成机制研究

周建张敏《统计研究》2014,31(9):37-43

本文采用最新前沿的宏观经济数据诊断和空间面板模型理论和方法,对我国30个省（西藏除外）1999-2011年的省际GDP强影响性特征进行了实证分析,在此基础上,进一步进行B-N分解深入研究了其形成机制。主要结论表明：我国省际GDP不仅空间关联,而且存在着显著的强影响性特征,每一年样本信息对于样本期内全部信息的影响程度和大小存在着差异。同时,我国省际GDP强影响性特征具有极其显著的形成机制,随机性成分是造成强影响性的主要根源,周期成分也产生较为重要的影响。以上结论对于我国省际GDP强影响性特征判断具有十分重要的政策含义,进一步对于我国政府熨平经济波动、实现宏观经济的平稳较快增长的政策制订和完善等具有十分重要的现实借鉴意义。相似文献

14.

A note on efficient simulation of multidimensional spatial autoregressive processes

Philipp Otto 《统计学通讯:模拟与计算》2017,46(6):4547-4558

In applications of spatial statistics, it is necessary to compute the product of some matrix W of spatial weights and a vector y of observations. The weighting matrix often needs to be adapted to the specific problems, such that the computation of Wy cannot necessarily be done with available R-packages. Hence, this article suggests one possibility treating such issues. The proposed technique avoids the computation of the matrix product by calculating each entry of Wy separately. Initially, a specific spatial autoregressive process is introduced. The performance of the proposed program is briefly compared to a basic program using the matrix multiplication. 相似文献

15.

INFLUENTIAL OBSERVATION IDENTIFICATION IN THE GROWTH CURVE MODEL WITH RAO'S SIMPLE COVARIANCE STRUCTURE

《统计学通讯:理论与方法》2013,42(5):813-831

ABSTRACT

In this paper we discuss the identification of influential observations in a growth curve model with Rao's simple covariance structure. Based on the generalized Cook-type distance and the volume of a confidence ellipsoid, a variety of influence measures are proposed in terms of the case-deletion technique. Also, the influence of observations on a linear combination of regression coefficients is considered. For illustration, a practical example is analyzed using the proposed approach. 相似文献

16.

Prediction intervals for growth curves

D. N. Naik 《Journal of applied statistics》1990,17(2):245-254

The growth curve model Y^n×p = A^{n×p^ξmtimes;k}B^k×p+ E^nxp, where Y is an observation matrix, &sigma is a matrix of unknown parameters, A is a known matrix of rank m, B is a known matrix of rank k with 1'= (1, …, 1) as its first row, and the rows of E are independent each distributed as N^p(0,Σ,) is considered. The problem of constructing the prediction intervals for future observations using the above model is considered and approximate intervals assuming different structures on σ are derived. The results are illustrated with several data sets. 相似文献

17.

Influential observations in principal component analysis:a case study

P. Pack I. T. Jolliffe B. J. T. Morgan 《Journal of applied statistics》1988,15(1):39-52

A number of results have been derived recently concerning the influence of individual observations in a principal component analysis. Some of these results, particularly those based on the correlation matrix, are applied to data consisting of seven anatomical measurements on students. The data have a correlation structure which is fairly typical of many found in allometry. This case study shows that theoretical influence functions often provide good estimates of the actual changes observed when individual observations are deleted from a principal component analysis. Different observations may be influential for different aspects of the principal component analysis (coefficients, variances and scores of principal components); these differences, and the distinction between outlying and influential observations are discussed in the context of the case study. A number of other complications, such as switching and rotation of principal components when an observation is deleted, are also illustrated. 相似文献

18.

On the asymptotic distribution of Cook's distance in logistic regression models

Nirian Martín Leandro Pardo 《Journal of applied statistics》2009,36(10):1119-1146

It sometimes occurs that one or more components of the data exert a disproportionate influence on the model estimation. We need a reliable tool for identifying such troublesome cases in order to decide either eliminate from the sample, when the data collect was badly realized, or otherwise take care on the use of the model because the results could be affected by such components. Since a measure for detecting influential cases in linear regression setting was proposed by Cook [Detection of influential observations in linear regression, Technometrics 19 (1977), pp. 15–18.], apart from the same measure for other models, several new measures have been suggested as single-case diagnostics. For most of them some cutoff values have been recommended (see [D.A. Belsley, E. Kuh, and R.E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, 2nd ed., John Wiley & Sons, New York, Chichester, Brisban, (2004).], for instance), however the lack of a quantile type cutoff for Cook's statistics has induced the analyst to deal only with index plots as worthy diagnostic tools. Focussed on logistic regression, the aim of this paper is to provide the asymptotic distribution of Cook's distance in order to look for a meaningful cutoff point for detecting influential and leverage observations. 相似文献

19.

Influence diagnostics for the structural errors-in-variables model under the Student-t distribution 总被引：1，自引：1，他引：0

Manuel Galea Heleno Bolfarine Filidor Vilcalabra 《Journal of applied statistics》2002,29(8):1191-1204

The influence of observations on the parameter estimates for the simple structural errors-in-variables model with no equation error, under the Student-t distribution, is investigated using the local influence approach. The main conclusion is that the Student-t model with small degrees of freedom is able to incorporate possible outliers and influential observations in the data. The likelihood displacement approach is useful for outlier detection, especially when a masking phenomenon is present and the degrees of freedom parameter is large. The diagnostics are illustrated with two examples. 相似文献

20.

Bayesian inference and diagnostics in zero-inflated generalized power series regression model

Gladys D. Cacsire Barriga Dipak K. Dey 《统计学通讯:理论与方法》2013,42(22):6553-6568

ABSTRACT

The paper provides a Bayesian analysis for the zero-inflated regression models based on the generalized power series distribution. The approach is based on Markov chain Monte Carlo methods. The residual analysis is discussed and case-deletion influence diagnostics are developed for the joint posterior distribution, based on the ψ-divergence, which includes several divergence measures such as the Kullback–Leibler, J-distance, L₁ norm, and χ²-square in zero-inflated general power series models. The methodology is reflected in a data set collected by wildlife biologists in a state park in California. 相似文献