Several methods have been suggested, in the literature, to detect influential observations from the data fitting usual linear model y=X???+???, ???∽N(0, ???2I). Recently, Chatterjee & Hadi (1986) have reviewed most of these available methods and described the inter-relationships between them. In this article, we extend some of these methods to the case of multivariate regression data. We consider several data sets to illustrate the methods.  相似文献   

A robust case-deletion diagnostic which could avoid masking is presented for nonlinear reproductive dispersion models (NRDM) (Jorgensen, 1997) that include generalized linear models and exponential family nonlinear models as special cases. Based on the second-order approximation of log-likelihood displacement and Poon & Poon's (1999) conformal normal curvature, a measure of local influence ranging from 0 to 1 is constructed. An example illustrates application of the techniques.  相似文献   

A method for detecting outliers in axial data has been proposed by Best and Fisher (1986 Best, D.J., Fisher, N.I. (1986). Goodness-of-fit and discordancy tests for samples from the Watson distribution on the sphere. Aust. J. Stat. 28:1331.[Crossref] [Google Scholar]). For extending that work, we propose four new methods. Two of them are suitable for outlier detection and they depend on the classic geodesic distance and a modified version of this distance. The other two procedures, which are designed for influential observation detection, are based on the Kullback–Leibler and Cook’s distances. Some simulation experiments are performed to compare all considered methods. Detection and error rates are used as comparison criteria. Numerical results provide evidence in favor of the KL distance.  相似文献   

The identification of influential observations has drawn a great deal of attention in regression diagnostics. Most of these identification techniques are based on single case deletion and among them DFFITS has become very popular with the statisticians. But this technique along with all other single case diagnostics may be ineffective in the presence of multiple influential observations. In this paper we develop a generalized version of DFFITS based on group deletion and then propose a new technique to identify multiple influential observations using this. The advantage of using the proposed method in the identification of multiple influential cases is then investigated through several well-referred data sets.  相似文献   

The identification of influential observations in logistic regression has drawn a great deal of attention in recent years. Most of the available techniques like Cook's distance and difference of fits (DFFITS) are based on single-case deletion. But there is evidence that these techniques suffer from masking and swamping problems and consequently fail to detect multiple influential observations. In this paper, we have developed a new measure for the identification of multiple influential observations in logistic regression based on a generalized version of DFFITS. The advantage of the proposed method is then investigated through several well-referred data sets and a simulation study.  相似文献   

In regression, detecting anomalous observations is a significant step for model-building process. Various influence measures based on different motivational arguments are designed to measure the influence of observations through different aspects of various regression models. The presence of influential observations in the data is complicated by the existence of multicollinearity. The purpose of this paper is to assess the influence of observations in the Liu [9] and modified Liu [15] estimators by using the method of approximate case deletion formulas suggested by Walker and Birch [14]. A numerical example using a real data set used by Longley [10] and a Monte Carlo simulation are given to illustrate the theoretical results.  相似文献   

In this paper, two new multiple influential observation detection methods, GCD.GSPR and mCD*, are introduced for logistic regression. The proposed diagnostic measures are compared with the generalized difference in fits (GDFFITS) and the generalized squared difference in beta (GSDFBETA), which are multiple influential diagnostics. The simulation study is conducted with one, two and five independent variable logistic regression models. The performance of the diagnostic measures is examined for a single contaminated independent variable for each model and in the case where all the independent variables are contaminated with certain contamination rates and intensity. In addition, the performance of the diagnostic measures is compared in terms of the correct identification rate and swamping rate via a frequently referred to data set in the literature.  相似文献   

Since the seminal paper by Cook (1977) in which he introduced Cook's distance, the identification of influential observations has received a great deal of interest and extensive investigation in linear regression. It is well documented that most of the popular diagnostic measures that are based on single-case deletion can mislead the analysis in the presence of multiple influential observations because of the well-known masking and/or swamping phenomena. Atkinson (1981) proposed a modification of Cook's distance. In this paper we propose a further modification of the Cook's distance for the identification of a single influential observation. We then propose new measures for the identification of multiple influential observations, which are not affected by the masking and swamping problems. The efficiency of the new statistics is presented through several well-known data sets and a simulation study.  相似文献   

A large number of statistics are used in the literature to detect outliers and influential observations in the linear regression model. In this paper comparison studies have been made for determining a statistic which performs better than the other. This includes: (i) a detailed simulation study, and (ii) analyses of several data sets studied by different authors. Different choices of the design matrix of regression model are considered. Design A studies the performance of the various statistics for detecting the scale shift type outliers, and designs B and C provide information on the performance of the statistics for identifying the influential observations. We have used cutoff points using the exact distributions and Bonferroni's inequality for each statistic. The results show that the studentized residual which is used for detection of mean shift outliers is appropriate for detection of scale shift outliers also, and the Welsch's statistic and the Cook's distance are appropriate for detection of influential observations.  相似文献   

In this study, we adapt sufficient bootstrap into the jackknife-after-bootstrap (JaB) algorithm. The performances of the sufficient and conventional JaB methods have been compared for detecting influential observations in linear regression. Comparison is based on two real-world examples and an extensive designed simulation study. Design includes different sample sizes and various modeling scenarios. The results reveal that proposed method is a good competitor for conventional JaB method with less standard error and amount of computation.  相似文献   


In this paper, we introduce Liu estimator for the vector of parameters in linear measurement error models and discuss its asymptotic properties. Based on the Liu estimator, diagnostic measures are developed to identify influential observations. Additionally, the analogs of Cook’s distance and likelihood distance are proposed to determine influential observations using case deletion approach. A parametric bootstrap procedure is used to obtain empirical distributions of the test statistics. Finally, the performance of the influence measures have been illustrated through simulation study and analyzing a real data set.  相似文献   

本文在Novy(2013)的基础之上,利用1997年、2002年与2007年区域间投入产出表,首次基于增加值贸易视角进行了中国区际贸易成本变动的测算与分解。结果显示:首先,不同于传统贸易流量的测度,基于增加值贸易视角的测度表明,1997-2007年的区际贸易成本不仅未有上升,反而大幅下降。其次,尽管当前的区际贸易成本有所下降,但是这种下降主要体现为以内陆地区和初级产品带动的低端化整合。最后,进一步地利用双边分解,本文发现低端化整合的迹象可能在于邻区偏少、产业同构度过高与政策缺失下,东部沿海更多利用国外需求进行替代所致。随着东部沿海的对外依赖程度提升,区际分工整合将会面临“初级产品分工整合加快→服务行业过度出口→遭受出口俘获→初级产品分工整合进一步加快”的恶性循环。因此,进一步转变发展观念、扭转市场分割激励、加快基础设施建设,才能从根本上提升区际整合质量。  相似文献   

Detection of multiple unusual observations such as outliers, high leverage points and influential observations (IOs) in regression is still a challenging task for statisticians due to the well-known masking and swamping effects. In this paper we introduce a robust influence distance that can identify multiple IOs, and propose a sixfold plotting technique based on the well-known group deletion approach to classify regular observations, outliers, high leverage points and IOs simultaneously in linear regression. Experiments through several well-referred data sets and simulation studies demonstrate that the proposed algorithm performs successfully in the presence of multiple unusual observations and can avoid masking and/or swamping effects.  相似文献   

马丹  何雅兴 《统计研究》2020,37(3):3-19
在进口中间产品的贸易格局下,我国出口中的国外增加值难以持续增长的深层次因素需要从出口国外增加值的演变动态过程中进行分析。本文根据出口中原材料的来源地和国家间增加值的核算平衡关系建立生产核算模型,直观地测算了出口中的国外增加值,将出口中国外增加值的变化分解为进口国增值能力、进口国技术水平、国际贸易联系、我国技术水平、出口水平变化五部分,并讨论了我国出口边际变化的影响。结果表明出口沿着二元边际变动是影响我国出口国外增加值变化的最主要原因。进一步从微观层面分析出口边际的变化,通过对比不同技术水平企业发现,国际贸易环境、规模经济和政策扶持仍然是我国制造业企业持续出口的外在条件,而产品质量、生产效率等因素则为我国高技术企业在海外市场生存提供了内在动力。  相似文献   


This paper is devoted to application of the singular-spectrum analysis to sequential detection of changes in time series. An algorithm of change-point detection in time series, based on sequential application of the singular-spectrum analysis is developed and studied. The algorithm is applied to different data sets and extensively studied numerically. For specific models, several numerical approximations to the error probabilities and the power function of the algorithm are obtained. Numerical comparisons with other methods are given.  相似文献   

叶青  韩立岩 《统计研究》2012,29(3):97-101
本文使用小波变换模极大值方法分析次贷危机中美国证券市场的突变。研究发现,小波模极大值方法准确定位了金融资产价格异常点的具体时刻;检测出了2类奇异点,其中峰值点检测比过零点检测更稳健;这些奇异点对应了美国次贷危机主要发展阶段的重大经济事件,反应出危机中美国经济系统异常对金融市场造成的影响。文章最后进行了稳健性检验。  相似文献   

基于分位点回归模型的条件VaR估计以及杠杆效应分析   总被引:1,自引:0,他引:1  
 在文献中,分析杠杆效应时大多数都是基于ARCH类模型,本文应用分位点回归模型及其变点检测模型分析了“已实现”波动率条件下的CVaR,并尝试从CVaR的角度对杠杆效应进行分析。最后,对中国股票市场进行了实证研究,得到了“已实现”波动率条件下的CVaR估计,并对中国股市的杠杆效应进行了分析。  相似文献   

This article discusses the ability of information criteria toward the correct selection of different especially higher-order generalized autoregressive conditional heteroscedasticity (GARCH) processes, based on their probability of correct selection as a measure of performance. Each of the considered GARCH processes is further simulated at different parameter combinations to study the possible effect of different volatility structures on these information criteria. We notice an impact from the volatility structure of time series on the performance of these criteria. Moreover, the influence of sample size, having an impact on the performance of these criteria toward correct selection, is observed.  相似文献   

杨蕙馨  张红霞 《统计研究》2020,37(10):66-78
基于增加值和最终产品的生产分解模型,本文对我国制造业前向与后向产业关联下的全球价值链嵌入进行测度,实证分析全球价值链嵌入对技术创新的作用机理,并在此基础上重点探讨了吸收能力与技术差距两个重要情境因素的调节作用,同时,运用双重差分、工具变量法以及GMM动态面板模型进行稳健性检验,以控制潜在的内生性问题。研究发现:①我国制造业通过嵌入全球价值链的国际间知识溢出效应促进技术创新能力的提升;②吸收能力能够强化这一正向影响关系;③技术差距在后向全球价值链嵌入对技术创新的影响关系中呈倒U 型调节作用,而在前向全球价值链嵌入对技术创新的影响关系中呈正向调节作用。本文推动了网络嵌入理论和知识溢出理论从组织网络向全球价值链领域的繁衍,丰富了全球价值链嵌入领域的研究成果,同时为我国制造业企业在参与国际分工过程中利用全球价值链嵌入实现技术创新能力提升提供重要的理论参考。  相似文献   

