首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 375 毫秒
1.
In this article, we use the influence function matrix of auto and cross-correlations of a bivariate (multivariate) time series for detecting the outliers. The multivariate analog of the graphical method of Chernick et. al. (1982), to detect outliers and partial outliers is presented. A simulation study illustrating the method is also given.  相似文献   

2.
Detection of outliers or influential observations is an important work in statistical modeling, especially for the correlated time series data. In this paper we propose a new procedure to detect patch of influential observations in the generalized autoregressive conditional heteroskedasticity (GARCH) model. Firstly we compare the performance of innovative perturbation scheme, additive perturbation scheme and data perturbation scheme in local influence analysis. We find that the innovative perturbation scheme give better result than other two schemes although this perturbation scheme may suffer from masking effects. Then we use the stepwise local influence method under innovative perturbation scheme to detect patch of influential observations and uncover the masking effects. The simulated studies show that the new technique can successfully detect a patch of influential observations or outliers under innovative perturbation scheme. The analysis based on simulation studies and two real data sets show that the stepwise local influence method under innovative perturbation scheme is efficient for detecting multiple influential observations and dealing with masking effects in the GARCH model.  相似文献   

3.
In this paper, we present a test procedure to detect outliers in the one-parameter exponential distribution based on prediction. The distribution of the test statistic is obtained. The proposed test can be used to detect more than one outlier and the required percentage points can be easily determined. Furthermore, the test provides a simple procedure to detect whether a given set of data is free from outliers or spurious observations.  相似文献   

4.
Outlier detection has been used extensively in data analysis to detect anomalous observation in data. It has important applications such as in fraud detection and robust analysis, among others. In this paper, we propose a method in detecting multiple outliers in linear functional relationship model for circular variables. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering approach. With the use of a tree diagram, we illustrate the detection of outliers graphically. A Monte Carlo simulation study is done to verify the accuracy of the proposed method. Low probability of masking and swamping effects indicate the validity of the proposed approach. Also, the illustrations to two sets of real data are given to show its practical applicability.  相似文献   

5.
The use of logistic regression modeling has seen a great deal of attention in the literature in recent years. This includes all aspects of the logistic regression model including the identification of outliers. A variety of methods for the identification of outliers, such as the standardized Pearson residuals, are now available in the literature. These methods, however, are successful only if the data contain a single outlier. In the presence of multiple outliers in the data, which is often the case in practice, these methods fail to detect the outliers. This is due to the well-known problems of masking (false negative) and swamping (false positive) effects. In this article, we propose a new method for the identification of multiple outliers in logistic regression. We develop a generalized version of standardized Pearson residuals based on group deletion and then propose a technique for identifying multiple outliers. The performance of the proposed method is then investigated through several examples.  相似文献   

6.
Abstract

In this paper, we propose an outlier-detection approach that uses the properties of an intercept estimator in a difference-based regression model (DBRM) that we first introduce. This DBRM uses multiple linear regression, and invented it to detect outliers in a multiple linear regression. Our outlier-detection approach uses only the intercept; it does not require estimates for the other parameters in the DBRM. In this paper, we first employed a difference-based intercept estimator to study the outlier-detection problem in a multiple regression model. We compared our approach with several existing methods in a simulation study and the results suggest that our approach outperformed the others. We also demonstrated the advantage of our approach using a real data application. Our approach can extend to nonparametric regression models for outliers detection.  相似文献   

7.
Mixture regression models are used to investigate the relationship between variables that come from unknown latent groups and to model heterogenous datasets. In general, the error terms are assumed to be normal in the mixture regression model. However, the estimators under normality assumption are sensitive to the outliers. In this article, we introduce a robust mixture regression procedure based on the LTS-estimation method to combat with the outliers in the data. We give a simulation study and a real data example to illustrate the performance of the proposed estimators over the counterparts in terms of dealing with outliers.  相似文献   

8.
In geostatistics, detecting atypical observations is of special interest due to the changes they can cause in environmental and geological patterns. Several methods for detecting them have been already suggested for the univariate spatial case. However, the problem is more complicated when various variables are observed simultaneously and the spatial correlation among them must be taken into account. The aim of this paper is to detect outliers and influential observations in multivariate spatial linear models. For this purpose, we derive and explore two different methods. First, a multivariate version of the forward search algorithm is given, where locations with outliers are detected in the last steps of the procedure. Next, we derive influence measures to assess the impact of the observations on the multivariate spatial linear model. The procedures are easy to compute and to interpret by means of graphical representations. Finally, an example and a Monte Carlo study illustrate the performance of these methods for identification of outliers in multivariate spatial linear models.  相似文献   

9.
The presence of outliers in the data sets affects the structure of multicollinearity which arises from a high degree of correlation between explanatory variables in a linear regression analysis. This affect could be seen as an increase or decrease in the diagnostics used to determine multicollinearity. Thus, the cases of outliers reduce the reliability of diagnostics such as variance inflation factors, condition numbers and variance decomposition proportions. In this study, we propose to use a robust estimation of the correlation matrix obtained by the minimum covariance determinant method to determine the diagnostics of multicollinearity in the presence of outliers. As a result, the present paper demonstrates that the diagnostics of multicollinearity obtained by the robust estimation of the correlation matrix are more reliable in the presence of outliers.  相似文献   

10.
In this paper we present a "model free' method of outlier detection for Gaussian time series by using the autocorrelation structure of the time series. We also present a graphic diagnostic method in order to distinguish an additive outlier (AO) from an innovation outlier (IO). The test statistic for detecting the outlier has a χ ² distribution with one degree of freedom. We show that this method works well when the time series contain either one type of the outliers or both additive and innovation type outliers, and this method has the advantage that no time series model needs to be estimated from the data. Simulation evidence shows that different types of outliers can be graphically distinguished by using the techniques proposed.  相似文献   

11.
In this paper, we discuss the bivariate Birnbaum-Saunders accelerated lifetime model, in which we have modeled the dependence structure of bivariate survival data through the use of frailty models. Specifically, we propose the bivariate model Birnbaum-Saunders with the following frailty distributions: gamma, positive stable and logarithmic series. We present a study of inference and diagnostic analysis for the proposed model, more concisely, are proposed a diagnostic analysis based in local influence and residual analysis to assess the fit model, as well as, to detect influential observations. In this regard, we derived the normal curvatures of local influence under different perturbation schemes and we performed some simulation studies for assessing the potential of residuals to detect misspecification in the systematic component, the presence in the stochastic component of the model and to detect outliers. Finally, we apply the methodology studied to real data set from recurrence in times of infections of 38 kidney patients using a portable dialysis machine, we analyzed these data considering independence within the pairs and using the bivariate Birnbaum-Saunders accelerated lifetime model, so that we could make a comparison and verify the importance of modeling dependence within the times of infection associated with the same patient.  相似文献   

12.
由于传统因子分析方法对离群值较敏感,导致计算结果与实际不相符。针对这一现象,本文运用FAST-MCD方法对传统因子分析方法进行改进,构建出因子分析的稳健算法,以克服离群值的影响,并对此方法进行了模拟和实证分析。模拟和实证分析结果均表明:因子旋转前后,当数据中不存在离群值时,传统因子分析与稳健因子分析得到的结果基本保持一致;当数据中存在离群值时,运用传统因子分析得到的结果出现较大变化,而运用稳健因子分析方法得到的结果基本不变,这说明相对于传统因子分析方法,稳健因子分析方法能有效抵抗离群值的影响,具有良好的抗干扰性和高抗差性。  相似文献   

13.
时间序列自回归AR模型在建模过程中易受离群值的影响,导致计算结果与实际不相符。针对这一现象,运用FQn统计量对传统自相关函数进行改进,构建出自回归AR模型的稳健估计算法,以克服离群值的影响,并对此方法进行了模拟和实证分析。模拟和实证分析均表明:当时序数据中不存在离群值时,传统估计方法与稳健估计方法得到的结果基本保持一致;当数据中存在离群值时,运用传统估计方法得到的结果出现较大变化,而运用稳健估计方法得到的结果基本不变.这说明相对于传统估计方法,稳健估计方法能有效抵抗离群值的影响,具有良好的抗干扰性和高抗差性。  相似文献   

14.
In this article, utilizing a scale mixture of skew-normal distribution in which mixing random variable is assumed to follow a mixture model with varying weights for each observation, we introduce a generalization of skew-normal linear regression model with the aim to provide resistant results. This model, which also includes the skew-slash distribution in a particular case, allows us to accommodate and detect outlying observations under the skew-normal linear regression model. Inferences about the model are carried out through the empirical Bayes approach. The conditions for propriety of the posterior and for existence of posterior moments are given under the standard noninformative priors for regression and scale parameters as well as proper prior for skewness parameter. Then, for Bayesian inference, a Markov chain Monte Carlo method is described. Since posterior results depend on the prior hyperparameters, we estimate them adopting the empirical Bayes method as well as using a Monte Carlo EM algorithm. Furthermore, to identify possible outliers, we also apply the Bayes factor obtained through the generalized Savage-Dickey density ratio. Examining the proposed approach on simulated instance and real data, it is found to provide not only satisfactory parameter estimates rather allow identifying outliers favorably.  相似文献   

15.
Geometric mean (GM) is having growing and wider applications in statistical data analysis as a measure of central tendency. It is generally believed that GM is less sensitive to outliers than the arithmetic mean (AM) but we suspect likewise the AM the GM may also suffer a huge set back in the presence of outliers, especially when multiple outliers occur in a data. So far as we know, not much work has been done on the robustness issue of GM. In quest of a simple robust measure of central tendency, we propose the geometric median (GMed) in this paper. We show that the classical GM has only 0% breakdown point while it is 50% for the proposed GMed. Numerical examples also support our claim that the proposed GMed is unaffected in the presence of multiple outliers and can maintain the highest possible 50% breakdown. Later we develop a new method for the identification of multiple outliers based on this proposed GMed. A variety of numerical examples show that the proposed method can successfully identify all potential outliers while the traditional GM fails to do so.  相似文献   

16.
In this article, a robust variable selection procedure based on the weighted composite quantile regression (WCQR) is proposed. Compared with the composite quantile regression (CQR), WCQR is robust to heavy-tailed errors and outliers in the explanatory variables. For the choice of the weights in the WCQR, we employ a weighting scheme based on the principal component method. To select variables with grouping effect, we consider WCQR with SCAD-L2 penalization. Furthermore, under some suitable assumptions, the theoretical properties, including the consistency and oracle property of the estimator, are established with a diverging number of parameters. In addition, we study the numerical performance of the proposed method in the case of ultrahigh-dimensional data. Simulation studies and real examples are provided to demonstrate the superiority of our method over the CQR method when there are outliers in the explanatory variables and/or the random error is from a heavy-tailed distribution.  相似文献   

17.
The presence of outliers would inevitably lead to distorted analysis and inappropriate prediction, especially for multiple outliers in high-dimensional regression, where the high dimensionality of the data might amplify the chance of an observation or multiple observations being outlying. Noting that the detection of outliers is not only necessary but also important in high-dimensional regression analysis, we, in this paper, propose a feasible outlier detection approach in sparse high-dimensional linear regression model. Firstly, we search a clean subset by use of the sure independence screening method and the least trimmed square regression estimates. Then, we define a high-dimensional outlier detection measure and propose a multiple outliers detection approach through multiple testing procedures. In addition, to enhance efficiency, we refine the outlier detection rule after obtaining a relatively reliable non-outlier subset based on the initial detection approach. By comparison studies based on Monte Carlo simulation, it is shown that the proposed method performs well for detecting multiple outliers in sparse high-dimensional linear regression model. We further illustrate the application of the proposed method by empirical analysis of a real-life protein and gene expression data.  相似文献   

18.
史兴杰等 《统计研究》2020,37(9):95-105
对于实证研究中经常遇到变量维数高和存在异常值的二分类问题,探索稳健的高维二分类方法显得尤为重要。本文提出基于Lasso惩罚的光滑0-1损失函数二分类法,并利用Fabs 算法高效地解决了变量选择和参数估计问题。数值模拟的结果表明,在不同异常值比例下该方法均具有良好的稳健性。基于CHIP 2013年度数据,利用该方法对农民工子女高中入学决定的影响因素进行了实证研究。分析发现,农民工父母的教育水平、教育水平与家庭经济状况的交互作用、农民工子女性别、性别与民族的交互作用均对农民工子女的入学决定有重要影响。  相似文献   

19.
In this paper, we propose a method for outlier detection and removal in electromyographic gait-related patterns (EMG-GRPs). The goal was to detect and remove EMG-GRPs that reduce the quality of gait data while preserving natural biological variations in EMG-GRPs. The proposed procedure consists of general statistical tests and is simple to use. The Friedman test with multiple comparisons was used to find particular EMG-GRPs that are extremely different from others. Next, outlying observations were calculated for each suspected stride waveform by applying the generalized extreme studentized deviate test. To complete the analysis, we applied different outlier criteria. The results suggest that an EMG-GRP is an outlier if it differs from at least 50% of the other stride waveforms and contains at least 20% of the outlying observations. The EMG signal remains a realistic representation of muscle activity and demonstrates step-by-step variability once the outliers, as defined here, are removed.  相似文献   

20.
In this article, we model the relationship between two circular variables using the circular regression models, to be called JS circular regression model, which was proposed by Jammalamadaka and Sarma (1993). The model has many interesting properties and is sensitive enough to detect the occurrence of outliers. We focus our attention on the problem of identifying outliers in this model. In particular, we extend the use of the COVRATIO statistic, which has been successfully used in the linear case for the same purpose, to the JS circular regression model via a row deletion approach. Through simulation studies, the cut-off points for the new procedure are obtained and its power of performance is investigated. It is found that the performance improves when the resulting residuals have small variance and when the sample size gets larger. An example of the application of the procedure is presented using a real dataset.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号