首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
李向杰等 《统计研究》2018,35(7):115-124
经典的充分降维方法对解释变量存在异常值或者当其是厚尾分布时效果较差,为此,经过对充分降维理论中加权与累积切片的分析研究,本文提出了一种将两者有机结合的稳健降维方法:累积加权切片逆回归法(CWSIR)。该方法对自变量存在异常值以及小样本情况下表现比较稳健,并且有效避免了对切片数目的选择。数值模拟结果显示CWSIR要优于传统的切片逆回归(SIR)、累积切片估计(CUME)、基于等高线的切片逆回归估计(CPSIR)、加权典则相关估计(WCANCOR)、切片逆中位数估计(SIME)、加权逆回归估计(WIRE)等方法。最后,我们通过对某视频网站真实数据的分析也验证了CWSIR的有效性。  相似文献   

2.
The detection of influential observations on the estimation of the dimension reduction subspace returned by Sliced Inverse Regression (SIR) is considered. Although there are many measures to detect influential observations in related methods such as multiple linear regression, there has been little development in this area with respect to dimension reduction. One particular influence measure for a version of SIR is examined and it is shown, via simulation and example, how this may be used to detect influential observations in practice.  相似文献   

3.
This paper documents situations where the variance inflation model for outliers has undesirable properties. The model is commonly used to accommodate outliers in a Bayesian analysis of regression and time series models. The alternative approach provided here does not suffer from these undesirable properties but gives inferences similar to those of the variance inflation model when this is appropriate. It can be used with regression, time series, and regression with correlated errors in a unified way, and adheres to the scientific principle that inference should be based on the data after obvious outliers have been discarded. Only one parameter is required for outliers; it is interpretable as the a priori willingness to remove observations from the analysis.  相似文献   

4.
Geometric mean (GM) is having growing and wider applications in statistical data analysis as a measure of central tendency. It is generally believed that GM is less sensitive to outliers than the arithmetic mean (AM) but we suspect likewise the AM the GM may also suffer a huge set back in the presence of outliers, especially when multiple outliers occur in a data. So far as we know, not much work has been done on the robustness issue of GM. In quest of a simple robust measure of central tendency, we propose the geometric median (GMed) in this paper. We show that the classical GM has only 0% breakdown point while it is 50% for the proposed GMed. Numerical examples also support our claim that the proposed GMed is unaffected in the presence of multiple outliers and can maintain the highest possible 50% breakdown. Later we develop a new method for the identification of multiple outliers based on this proposed GMed. A variety of numerical examples show that the proposed method can successfully identify all potential outliers while the traditional GM fails to do so.  相似文献   

5.
Sliced inverse regression (SIR) was developed to find effective linear dimension-reduction directions for exploring the intrinsic structure of the high-dimensional data. In this study, we present isometric SIR for nonlinear dimension reduction, which is a hybrid of the SIR method using the geodesic distance approximation. First, the proposed method computes the isometric distance between data points; the resulting distance matrix is then sliced according to K-means clustering results, and the classical SIR algorithm is applied. We show that the isometric SIR (ISOSIR) can reveal the geometric structure of a nonlinear manifold dataset (e.g., the Swiss roll). We report and discuss this novel method in comparison to several existing dimension-reduction techniques for data visualization and classification problems. The results show that ISOSIR is a promising nonlinear feature extractor for classification applications.  相似文献   

6.
SUMMARY The discordancy test for multiple outliers is complicated by problems of masking and swamping. The key to the settlement of the question lies in the determination of k , i.e. the number of 'contaminants' in a sample. Great efforts have been made to solve this problem in recent years, but no effective method has been developed. In this paper, we present two ways of determining k , free from the effects of masking and swamping, when testing upper (lower) outliers in normal samples. Examples are given to illustrate the methods.  相似文献   

7.
Cook-statistic has been developed for detecting outliers in two likely situations of occurrence of outliers in multi-response experiments. In the first situation, more than one outlying observations vector has been considered. Each of these vectors is obtained on the assumption that a particular observation from each of the responses is an outlier. A general expression of Cook-statistic for detecting any such t outlying observations vectors has been obtained. Then some particular cases have been considered. In the second case a situation is considered where observations from all the responses may not be outliers. Here also a general expression of Cook-statistic is obtained for detecting any t observations from each of any k responses as outliers. In both the cases Cook-statistic is applied to real experimental data.  相似文献   

8.
L. Ferré  A. F. Yao 《Statistics》2013,47(6):475-488
Most of the usual multivariate methods have been extended to the context of functional data analysis. Our contribution concerns the study of sliced inverse regression (SIR) when the response variable is real but the regressor is a function. In the first part, we show how the relevant properties of SIR remain essentially the same in the functional context under suitable conditions. Unfortunately, the estimation procedure used in the multivariate case cannot be directly transposed to the functional one. Then, we propose a solution that overcomes this difficulty and we show the consistency of the estimates of the parameters of the model.  相似文献   

9.
ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers.  相似文献   

10.
A large number of statistics are used in the literature to detect outliers and influential observations in the linear regression model. In this paper comparison studies have been made for determining a statistic which performs better than the other. This includes: (i) a detailed simulation study, and (ii) analyses of several data sets studied by different authors. Different choices of the design matrix of regression model are considered. Design A studies the performance of the various statistics for detecting the scale shift type outliers, and designs B and C provide information on the performance of the statistics for identifying the influential observations. We have used cutoff points using the exact distributions and Bonferroni's inequality for each statistic. The results show that the studentized residual which is used for detection of mean shift outliers is appropriate for detection of scale shift outliers also, and the Welsch's statistic and the Cook's distance are appropriate for detection of influential observations.  相似文献   

11.
In many situations, the quality of a process or product may be better characterized and summarized by a relationship between the response variable and one or more explanatory variables. Parameter estimation is the first step in constructing control charts. Outliers may hamper proper classical estimators and lead to incorrect conclusions. To remedy the problem of outliers, robust methods have been developed recently. In this article, a robust method is introduced for estimating the parameters of simple linear profiles. Two weight functions, Huber and Bisquare, are applied in the estimation algorithm. In addition, a method for robust estimation of the error terms variance is proposed. Simulation studies are done to investigate and evaluate the performance of the proposed estimator, as well as the classical one, in the presence and absence of outliers under different scenarios by the means of MSE criterion. The results reveal that the robust estimators proposed in this research perform as well as classical estimators in the absence of outliers and even considerably better when outliers exist. The maximum value of variance estimate in one scenario obtained from classical estimator is 10.9, while this value is 1.66 and 1.27 from proposed robust estimators when its actual value is 1.  相似文献   

12.
Abstract.  The sampling-importance resampling (SIR) algorithm aims at drawing a random sample from a target distribution π. First, a sample is drawn from a proposal distribution q , and then from this a smaller sample is drawn with sample probabilities proportional to the importance ratios π/ q . We propose here a simple adjustment of the sample probabilities and show that this gives faster convergence. The results indicate that our version converges better also for small sample sizes. The SIR algorithms are compared with the Metropolis–Hastings (MH) algorithm with independent proposals. Although MH converges asymptotically faster, the results indicate that our improved SIR version is better than MH for small sample sizes. We also establish a connection between the SIR algorithms and importance sampling with normalized weights. We show that the use of adjusted SIR sample probabilities as importance weights reduces the bias of the importance sampling estimate.  相似文献   

13.
Because outliers and leverage observations unduly affect the least squares regression, the identification of influential observations is considered an important and integrai part of the analysis. However, very few techniques have been developed for the residual analysis and diagnostics for the minimum sum of absolute errors, L1 regression. Although the L1 regression is more resistant to the outliers than the least squares regression, it appears that outliers (leverage) in the predictor variables may affect it. In this paper, our objective is to develop an influence measure for the L1 regression based on the likelihood displacement function. We illustrate the proposed influence measure with examples.  相似文献   

14.
Although there exists an increasing interest in monitoring and diagnosing multistage processes through the recent years, this issue has been overlooked to a large extent in cascade processes where the quality characteristics are liable to outliers. The presence of outliers has a debilitating effect on the detect-ability of the traditional cause selecting control charts and thus makes them unreliable. Therefore, the purpose of this article is to provide a robust approach to quality control in multistage processes. It is assumed that the process consists of two stages and the historical data with regard to both dependent quality characteristics contain outliers. A robust fitting procedure based on compound-estimator is employed to build the relationship between the quality variables and a robust monitoring approach is presented. Subsequently, simulation studies are undertaken to assess the performance of the robust scheme by means of the average run length (ARL) criterion. It is shown that the proposed robust procedure can much faster detect diverse types of shift.  相似文献   

15.
ABSTRACT

The aim of this paper is obtaining the amount of information there exists in the Pareto distribution in the presence of outliers. For the sake of this purpose, Shannon entropy, ?-entropy, Fisher information, and Kullback–Leibler distance are computed. Furthermore, a section has been devoted to compare these quantities in these two cases of the Pareto distribution (with outliers and the homogenous case). At the end of this paper, two actual examples, which are related to insurance companies, are brought. A brief summary of which is done in this work is also reported.  相似文献   

16.
In geostatistics, detecting atypical observations is of special interest due to the changes they can cause in environmental and geological patterns. Several methods for detecting them have been already suggested for the univariate spatial case. However, the problem is more complicated when various variables are observed simultaneously and the spatial correlation among them must be taken into account. The aim of this paper is to detect outliers and influential observations in multivariate spatial linear models. For this purpose, we derive and explore two different methods. First, a multivariate version of the forward search algorithm is given, where locations with outliers are detected in the last steps of the procedure. Next, we derive influence measures to assess the impact of the observations on the multivariate spatial linear model. The procedures are easy to compute and to interpret by means of graphical representations. Finally, an example and a Monte Carlo study illustrate the performance of these methods for identification of outliers in multivariate spatial linear models.  相似文献   

17.
The aim of this article is to analyse the effect of the level shift and temporary change outliers on the estimation of a model with conditional heteroscedasticity, a concept rarely dealt with up to now, the literature focusing more on additive outliers. To do this, we have conducted various Monte Carlo experiments in which the bias produced by these outliers is analysed.  相似文献   

18.
The purpose of this paper is to examine the robustness in finite samples of a test for outliers based on the maximum internally studentized residual and the RESET test for functional form misspecification. The effects of incorrect specification on the adequate detection of outliers and the presence of one or more outliers on the rejection frequencies of RESET are analysed. It is found that, ingeneral, the test for outliers does not seem to be robust to functional form misspecification, while the rejection frequencies of RESET can be reduced, sometimes dramatically, in the presence of outliers.  相似文献   

19.
The Zero-inflated Poisson distribution has been used in the modeling of count data in different contexts. This model tends to be influenced by outliers because of the excessive occurrence of zeroes, thus outlier identification and robust parameter estimation are important for such distribution. Some outlier identification methods are studied in this paper, and their applications and results are also presented with an example. To eliminate the effect of outliers, two robust parameter estimates are proposed based on the trimmed mean and the Winsorized mean. Simulation results show the robustness of our proposed parameter estimates.  相似文献   

20.
Principal component analysis (PCA) is a popular technique that is useful for dimensionality reduction but it is affected by the presence of outliers. The outlier sensitivity of classical PCA (CPCA) has caused the development of new approaches. Effects of using estimates obtained by expectation–maximization – EM and multiple imputation – MI instead of outliers were examined on the artificial and a real data set. Furthermore, robust PCA based on minimum covariance determinant (MCD), PCA based on estimates obtained by EM instead of outliers and PCA based on estimates obtained by MI instead of outliers were compared with the results of CPCA. In this study, we tried to show the effects of using estimates obtained by MI and EM instead of outliers, depending on the ratio of outliers in data set. Finally, when the ratio of outliers exceeds 20%, we suggest the use of estimates obtained by MI and EM instead of outliers as an alternative approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号