首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 46 毫秒

In high-dimensional regression, the presence of influential observations may lead to inaccurate analysis results so that it is a prime and important issue to detect these unusual points before statistical regression analysis. Most of the traditional approaches are, however, based on single-case diagnostics, and they may fail due to the presence of multiple influential observations that suffer from masking effects. In this paper, an adaptive multiple-case deletion approach is proposed for detecting multiple influential observations in the presence of masking effects in high-dimensional regression. The procedure contains two stages. Firstly, we propose a multiple-case deletion technique, and obtain an approximate clean subset of the data that is presumably free of influential observations. To enhance efficiency, in the second stage, we refine the detection rule. Monte Carlo simulation studies and a real-life data analysis investigate the effective performance of the proposed procedure.  相似文献   


Modeling diagnostics assess models by means of a variety of criteria. Each criterion typically performs its evaluation upon a specific inferential objective. For instance, the well-known DFBETAS in linear regression models are a modeling diagnostic which is applied to discover the influential cases in fitting a model. To facilitate the evaluation of generalized linear mixed models (GLMM), we develop a diagnostic for detecting influential cases based on the information complexity (ICOMP) criteria for detecting influential cases which substantially affect the model selection criterion ICOMP. In a given model, the diagnostic compares the ICOMP criterion between the full data set and a case-deleted data set. The computational formula of the ICOMP criterion is evaluated using the Fisher information matrix. A simulation study is accomplished and a real data set of cancer cells is analyzed using the logistic linear mixed model for illustrating the effectiveness of the proposed diagnostic in detecting the influential cases.  相似文献   


Influence diagnostics are investigated in this study. In particular, an approach based on the generalized linear mixed model setting is presented for formulating ordered categorical counts in stratified contingency tables. Deletion diagnostics and their first-order approximations are developed for assessing the stratum-specific influence on parameter estimates in the models. To illustrate the proposed model diagnostic technique, the method is applied to analyze two sets of data: a clinical trial and a survey study. The two examples demonstrate that the presence of influential strata may substantially change the results in ordinal contingency table analysis.  相似文献   

This paper examines local influence assessment in generalized autoregressive conditional heteroscesdasticity models with Gaussian and Student-t errors, where influence is examined via the likelihood displacement. The analysis of local influence is discussed under three perturbation schemes: data perturbation, innovative model perturbation and additive model perturbation. For each case, expressions for slope and curvature diagnostics are derived. Monte Carlo experiments are presented to determine the threshold values for locating influential observations. The empirical study of daily returns of the New York Stock Exchange composite index shows that local influence analysis is a useful technique for detecting influential observations; most of the observations detected as influential are associated with historical shocks in the market. Finally, based on this empirical study and the analysis of simulated data, some advice is given on how to use the discussed methodology.  相似文献   


Statistical methods are effectively used in the evaluation of pharmaceutical formulations instead of laborious liquid chromatography. However, signal overlapping, nonlinearity, multicollinearity and presence of outliers deteriorate the performance of statistical methods. The Partial Least Squares Regression (PLSR) is a very popular method in the quantification of high dimensional spectrally overlapped drug formulations. The SIMPLS is the mostly used PLSR algorithm, but it is highly sensitive to outliers that also effect the diagnostics. In this paper, we propose new robust multivariate diagnostics to identify outliers, influential observations and points causing non-normality for a PLSR model. We study performances of the proposed diagnostics on two everyday use highly overlapping drug systems: Paracetamol–Caffeine and Doxylamine Succinate–Pyridoxine Hydrochloride.  相似文献   

Plots are presented which are based on the singular value decomposition of the augmented data matrix in regression. In general, these plots assist in identifying discrepant observations, and in conjunction with associated diagnostics they are useful for identifying influential observations.  相似文献   

The identification of influential observations has drawn a great deal of attention in regression diagnostics. Most of these identification techniques are based on single case deletion and among them DFFITS has become very popular with the statisticians. But this technique along with all other single case diagnostics may be ineffective in the presence of multiple influential observations. In this paper we develop a generalized version of DFFITS based on group deletion and then propose a new technique to identify multiple influential observations using this. The advantage of using the proposed method in the identification of multiple influential cases is then investigated through several well-referred data sets.  相似文献   


In this paper, we introduce Liu estimator for the vector of parameters in linear measurement error models and discuss its asymptotic properties. Based on the Liu estimator, diagnostic measures are developed to identify influential observations. Additionally, the analogs of Cook’s distance and likelihood distance are proposed to determine influential observations using case deletion approach. A parametric bootstrap procedure is used to obtain empirical distributions of the test statistics. Finally, the performance of the influence measures have been illustrated through simulation study and analyzing a real data set.  相似文献   

Because outliers and leverage observations unduly affect the least squares regression, the identification of influential observations is considered an important and integrai part of the analysis. However, very few techniques have been developed for the residual analysis and diagnostics for the minimum sum of absolute errors, L1 regression. Although the L1 regression is more resistant to the outliers than the least squares regression, it appears that outliers (leverage) in the predictor variables may affect it. In this paper, our objective is to develop an influence measure for the L1 regression based on the likelihood displacement function. We illustrate the proposed influence measure with examples.  相似文献   

In this article, we proposed some influence diagnostics for the gamma regression model (GRM) and the gamma ridge regression model (GRRM). We assess the impact of influential observations on the GRM and GRRM estimates by extending the work of Pregibon [Logistic regression diagnostics. Ann Stat. 1981;9:705–724] and Walker and Birch [Influence measures in ridge regression. Technometrics. 1988;30:221–227]. Comparison of both models is made and demonstrated with the help of a simulation study and a real data set. We report some momentous results in detecting the influential observations and their effects on the GRM and GRRM estimates.  相似文献   

In the context of the general linear model Y=Xβ+ε, the matrix Pz =Z(ZTZ)?1 ZT , where Z=(X: Y), plays an important role in determining least squares results. In this article we propose two graphical displays for the off-diagonal as well as the diagonal elements of PZ . The two graphs are based on simple ideas and are useful in the detection of potentially influential subsets of observations in regression. Since PZ is invariant with respect to permutations of the columns of Z, an added advantage of these graphs is that they can be used to detect outliers in multivariate data where the rows of Z are usually regarded as a random sample from a multivariate population. We also suggest two calibration points, one for the diagonal elements of PZ and the other for the off-diagonal elements. The advantage of these calibration points is that they take into consideration the variability of the off-diagonal as well as the diagonal elements of PZ . They also do not suffer from masking.  相似文献   

In this paper, we use a likelihood approach and the local influence method introduced by Cook [Assessment of local influence (with discussion). J Roy Statist Soc Ser B. 1986;48:133–149] to study a vector autoregressive (VAR) model. We present the maximum likelihood estimators and the information matrix. We establish the normal curvature and slope diagnostics for the VAR model under several perturbation schemes and use the Monte Carlo method to obtain benchmark values for determining the influence of directional diagnostics and possible influential observations. An empirical study using the VAR model to fit real data of monthly returns of IBM and S&P500 index illustrates the effectiveness of our proposed diagnostics.  相似文献   

周建  张敏 《统计研究》2014,31(9):37-43
本文采用最新前沿的宏观经济数据诊断和空间面板模型理论和方法,对我国30个省(西藏除外)1999-2011年的省际GDP强影响性特征进行了实证分析,在此基础上,进一步进行B-N分解深入研究了其形成机制。主要结论表明:我国省际GDP不仅空间关联,而且存在着显著的强影响性特征,每一年样本信息对于样本期内全部信息的影响程度和大小存在着差异。同时,我国省际GDP强影响性特征具有极其显著的形成机制,随机性成分是造成强影响性的主要根源,周期成分也产生较为重要的影响。以上结论对于我国省际GDP强影响性特征判断具有十分重要的政策含义,进一步对于我国政府熨平经济波动、实现宏观经济的平稳较快增长的政策制订和完善等具有十分重要的现实借鉴意义。  相似文献   

In applications of spatial statistics, it is necessary to compute the product of some matrix W of spatial weights and a vector y of observations. The weighting matrix often needs to be adapted to the specific problems, such that the computation of Wy cannot necessarily be done with available R-packages. Hence, this article suggests one possibility treating such issues. The proposed technique avoids the computation of the matrix product by calculating each entry of Wy separately. Initially, a specific spatial autoregressive process is introduced. The performance of the proposed program is briefly compared to a basic program using the matrix multiplication.  相似文献   


In this paper we discuss the identification of influential observations in a growth curve model with Rao's simple covariance structure. Based on the generalized Cook-type distance and the volume of a confidence ellipsoid, a variety of influence measures are proposed in terms of the case-deletion technique. Also, the influence of observations on a linear combination of regression coefficients is considered. For illustration, a practical example is analyzed using the proposed approach.  相似文献   

The growth curve model Yn×p = An×p ξ mtimes;kBk×p+ Enxp, where Y is an observation matrix, &sigma is a matrix of unknown parameters, A is a known matrix of rank m, B is a known matrix of rank k with 1'= (1, …, 1) as its first row, and the rows of E are independent each distributed as Np(0,Σ,) is considered. The problem of constructing the prediction intervals for future observations using the above model is considered and approximate intervals assuming different structures on σ are derived. The results are illustrated with several data sets.  相似文献   

A number of results have been derived recently concerning the influence of individual observations in a principal component analysis. Some of these results, particularly those based on the correlation matrix, are applied to data consisting of seven anatomical measurements on students. The data have a correlation structure which is fairly typical of many found in allometry. This case study shows that theoretical influence functions often provide good estimates of the actual changes observed when individual observations are deleted from a principal component analysis. Different observations may be influential for different aspects of the principal component analysis (coefficients, variances and scores of principal components); these differences, and the distinction between outlying and influential observations are discussed in the context of the case study. A number of other complications, such as switching and rotation of principal components when an observation is deleted, are also illustrated.  相似文献   

It sometimes occurs that one or more components of the data exert a disproportionate influence on the model estimation. We need a reliable tool for identifying such troublesome cases in order to decide either eliminate from the sample, when the data collect was badly realized, or otherwise take care on the use of the model because the results could be affected by such components. Since a measure for detecting influential cases in linear regression setting was proposed by Cook [Detection of influential observations in linear regression, Technometrics 19 (1977), pp. 15–18.], apart from the same measure for other models, several new measures have been suggested as single-case diagnostics. For most of them some cutoff values have been recommended (see [D.A. Belsley, E. Kuh, and R.E. Welsch, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, 2nd ed., John Wiley & Sons, New York, Chichester, Brisban, (2004).], for instance), however the lack of a quantile type cutoff for Cook's statistics has induced the analyst to deal only with index plots as worthy diagnostic tools. Focussed on logistic regression, the aim of this paper is to provide the asymptotic distribution of Cook's distance in order to look for a meaningful cutoff point for detecting influential and leverage observations.  相似文献   

The influence of observations on the parameter estimates for the simple structural errors-in-variables model with no equation error, under the Student-t distribution, is investigated using the local influence approach. The main conclusion is that the Student-t model with small degrees of freedom is able to incorporate possible outliers and influential observations in the data. The likelihood displacement approach is useful for outlier detection, especially when a masking phenomenon is present and the degrees of freedom parameter is large. The diagnostics are illustrated with two examples.  相似文献   


The paper provides a Bayesian analysis for the zero-inflated regression models based on the generalized power series distribution. The approach is based on Markov chain Monte Carlo methods. The residual analysis is discussed and case-deletion influence diagnostics are developed for the joint posterior distribution, based on the ψ-divergence, which includes several divergence measures such as the Kullback–Leibler, J-distance, L1 norm, and χ2-square in zero-inflated general power series models. The methodology is reflected in a data set collected by wildlife biologists in a state park in California.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号