首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The use of logistic regression modeling has seen a great deal of attention in the literature in recent years. This includes all aspects of the logistic regression model including the identification of outliers. A variety of methods for the identification of outliers, such as the standardized Pearson residuals, are now available in the literature. These methods, however, are successful only if the data contain a single outlier. In the presence of multiple outliers in the data, which is often the case in practice, these methods fail to detect the outliers. This is due to the well-known problems of masking (false negative) and swamping (false positive) effects. In this article, we propose a new method for the identification of multiple outliers in logistic regression. We develop a generalized version of standardized Pearson residuals based on group deletion and then propose a technique for identifying multiple outliers. The performance of the proposed method is then investigated through several examples.  相似文献   

2.
Abstract

We study alternative models for capturing abrupt structural changes (level shifts) in a times series. The problem is confounded by the presence of transient outliers. We compare the performance of non-Gaussian time-varying parameter models and multiprocess mixture models within a Monte Carlo experimental setup. Our findings suggest that once we incorporate shocks with thick-tailed probability distributions, the superiority of the multiprocess mixture models over the time-varying parameter models, reported in an earlier study, disappears. The behavior of the two models, both in adapting to level shifts and in reacting to transient outliers, is very similar.  相似文献   

3.
Outliers in multilevel data   总被引:2,自引:0,他引:2  
This paper offers the data analyst a range of practical procedures for dealing with outliers in multilevel data. It first develops several techniques for data exploration for outliers and outlier analysis and then applies these to the detailed analysis of outliers in two large scale multilevel data sets from educational contexts. The techniques include the use of deviance reduction, measures based on residuals, leverage values, hierarchical cluster analysis and a measure called DFITS. Outlier analysis is more complex in a multilevel data set than in, say, a univariate sample or a set of regression data, where the concept of an outlying value is straightforward. In the multilevel situation one has to consider, for example, at what level or levels a particular response is outlying, and in respect of which explanatory variables; furthermore, the treatment of a particular response at one level may affect its status or the status of other units at other levels in the model.  相似文献   

4.
This paper documents situations where the variance inflation model for outliers has undesirable properties. The model is commonly used to accommodate outliers in a Bayesian analysis of regression and time series models. The alternative approach provided here does not suffer from these undesirable properties but gives inferences similar to those of the variance inflation model when this is appropriate. It can be used with regression, time series, and regression with correlated errors in a unified way, and adheres to the scientific principle that inference should be based on the data after obvious outliers have been discarded. Only one parameter is required for outliers; it is interpretable as the a priori willingness to remove observations from the analysis.  相似文献   

5.
Outliers can as readily arise in sample survey (i.e. finite population) data as in samples from infinite populations. For infinite populations, an extensive methodology exists: very little has been written on the finite population case. We shall explore matters of definition and concept to formulate some basic principles for handling outliers in sample survey data. Some existing methods for outlier accommodation are reviewed and proposals are made for the dual problem of outlier tests of discordancy.  相似文献   

6.
Cook-statistic has been developed for detecting outliers in two likely situations of occurrence of outliers in multi-response experiments. In the first situation, more than one outlying observations vector has been considered. Each of these vectors is obtained on the assumption that a particular observation from each of the responses is an outlier. A general expression of Cook-statistic for detecting any such t outlying observations vectors has been obtained. Then some particular cases have been considered. In the second case a situation is considered where observations from all the responses may not be outliers. Here also a general expression of Cook-statistic is obtained for detecting any t observations from each of any k responses as outliers. In both the cases Cook-statistic is applied to real experimental data.  相似文献   

7.
In this study, we want to detect the outliers from large medical records and exclude them because outliers will influence the accuracy and efficiency of the analysis. We compare the traditional method and the method from frontier model. In this study, we want to detect the outliers of cost and length of stay for pneumonia patients from HCUP 2002 and 2003.  相似文献   

8.
In geostatistics, detecting atypical observations is of special interest due to the changes they can cause in environmental and geological patterns. Several methods for detecting them have been already suggested for the univariate spatial case. However, the problem is more complicated when various variables are observed simultaneously and the spatial correlation among them must be taken into account. The aim of this paper is to detect outliers and influential observations in multivariate spatial linear models. For this purpose, we derive and explore two different methods. First, a multivariate version of the forward search algorithm is given, where locations with outliers are detected in the last steps of the procedure. Next, we derive influence measures to assess the impact of the observations on the multivariate spatial linear model. The procedures are easy to compute and to interpret by means of graphical representations. Finally, an example and a Monte Carlo study illustrate the performance of these methods for identification of outliers in multivariate spatial linear models.  相似文献   

9.
We consider integer-valued autoregressive models of order one contaminated with innovational outliers. Assuming that the time points of the outliers are known but their sizes are unknown, we prove that Conditional Least Squares (CLS) estimators of the offspring and innovation means are strongly consistent. In contrast, CLS estimators of the outliers' sizes are not strongly consistent. We also prove that the joint CLS estimator of the offspring and innovation means is asymptotically normal. Conditionally on the values of the process at time points preceding the outliers' occurrences, the joint CLS estimator of the sizes of the outliers is asymptotically normal.  相似文献   

10.
Abstract

There are three main problems in the existing procedures for detecting outliers in ARIMA models. The first one is the biased estimation of the initial parameter values that may strongly affect the power to detect outliers. The second problem is the confusion between level shifts and innovative outliers when the series has a level shift. The third problem is masking. We propose a procedure that keeps the powerful features of previous methods but improves the initial parameter estimate, avoids the confusion between innovative outliers and level shifts and includes joint tests for sequences of additive outliers in order to solve the masking problem. A Monte Carlo study and one example of the performance of the proposed procedure are presented.  相似文献   

11.
赵进文 《统计研究》2010,27(12):92-98
在经济计量建模过程中,异常值的影响评价与诊断问题越来越显得重要。本文旨在提供异常值对复共线性关系检验、序列相关性检验、异方差性检验、单位根检验等经济计量检验产生致命影响的典型案例,为经济计量学的教学与相关建模理论研究提供有说服力的数据资料。  相似文献   

12.
Identification of outlier vectors in block designs for incomplete multiresponse experiments has been considered. Design is composed of two sets of experimental units. Different numbers of response variables are observed from these two sets. Cook-statistic has been developed for identification of outliers. The developed statistic has been illustrated with a real-life data set. It has been shown that presence of outliers can distort the overall conclusion from an experiment.  相似文献   

13.
In this article, we present an M-estimator to estimate the parameters of the extended three-parameter Burr Type III distribution for complete data with outliers. The confidence intervals for all parameters can be obtained by the M-estimator's normal approximation. The simulation results show that the M-estimator generally outperforms the maximum likelihood and least squares methods in terms of bias and root mean square errors. We also investigate the M-estimator's impact on different quantiles and the mean for the Weibull and normal distributions with outliers. Two numerical examples are used to demonstrate the performance of our proposed method.  相似文献   

14.
Zerbet and Nikulin presented the new statistic Z k for detecting outliers in exponential distribution. They also compared this statistic with Dixon's statistic D k . In this article, we extend this approach to gamma distribution and compare the result with Dixon's statistic. The results show that the test based on statistic Z k is more powerful than the test based on the Dixon's statistic.  相似文献   

15.
Many methods have been developed for detecting multiple outliers in a single multivariate sample, but very few for the case where there may be groups in the data. We propose a method of simultaneously determining groups (as in cluster analysis) and detecting outliers, which are points that are distant from every group. Our method is an adaptation of the BACON algorithm proposed by Billor, Hadi and Velleman for the robust detection of multiple outliers in a single group of multivariate data. There are two versions of our method, depending on whether or not the groups can be assumed to have equal covariance matrices. The effectiveness of the method is illustrated by its application to two real data sets and further shown by a simulation study for different sample sizes and dimensions for 2 and 3 groups, with and without planted outliers in the data. When the number of groups is not known in advance, the algorithm could be used as a robust method of cluster analysis, by running it for various numbers of groups and choosing the best solution.  相似文献   

16.
The quantile residual lifetime function provides comprehensive quantitative measures for residual life, especially when the distribution of the latter is skewed or heavy‐tailed and/or when the data contain outliers. In this paper, we propose a general class of semiparametric quantile residual life models for length‐biased right‐censored data. We use the inverse probability weighted method to correct the bias due to length‐biased sampling and informative censoring. Two estimating equations corresponding to the quantile regressions are constructed in two separate steps to obtain an efficient estimator. Consistency and asymptotic normality of the estimator are established. The main difficulty in implementing our proposed method is that the estimating equations associated with the quantiles are nondifferentiable, and we apply the majorize–minimize algorithm and estimate the asymptotic covariance using an efficient resampling method. We use simulation studies to evaluate the proposed method and illustrate its application by a real‐data example.  相似文献   

17.
Outliers are to be found among the extremes of a data set. Extremes are examples of order statistics. It is thus relevant to ask to what extent the statistical methods (and probabilistic properties) of outliers and of order statistics coincide and depend on each other. Whilst clear overlap is identifiable, aims and procedures are often quite distinct and each topic plays its own important role in the panoply of statistical principles and methodology.  相似文献   

18.
In this paper, a penalized weighted composite quantile regression estimation procedure is proposed to estimate unknown regression parameters and autoregression coefficients in the linear regression model with heavy-tailed autoregressive errors. Under some conditions, we show that the proposed estimator possesses the oracle properties. In addition, we introduce an iterative algorithm to achieve the proposed optimization problem, and use a data-driven method to choose the tuning parameters. Simulation studies demonstrate that the proposed new estimation method is robust and works much better than the least squares based method when there are outliers in the dataset or the autoregressive error distribution follows heavy-tailed distributions. Moreover, the proposed estimator works comparably to the least squares based estimator when there are no outliers and the error is normal. Finally, we apply the proposed methodology to analyze the electricity demand dataset.  相似文献   

19.
When a process is monitored with a T 2 control chart in a Phase II setting, the MYT decomposition is a valuable diagnostic tool for interpreting signals in terms of the process variables. The decomposition splits a signaling T 2 statistic into independent components that can be associated with either individual variables or groups of variables. Since these components are T 2 statistics with known distributions, they can be used to determine which of the process variable(s) contribute to the signal. However, this procedure cannot be applied directly to Phase I since the distributions of the individual components are unknown. In this article, we develop the MYT decomposition procedure for a Phase I operation, when monitoring a random sample of individual observations and identifying outliers. We use a relationship between the T 2 statistic in Phase I with the corresponding T 2 statistic resulting when an observation is omitted from this sample to derive the distributions of these components and demonstrate the Phase I application of the MYT decomposition.  相似文献   

20.
Abstract. The zero‐inflated Poisson regression model is a special case of finite mixture models that is useful for count data containing many zeros. Typically, maximum likelihood (ML) estimation is used for fitting such models. However, it is well known that the ML estimator is highly sensitive to the presence of outliers and can become unstable when mixture components are poorly separated. In this paper, we propose an alternative robust estimation approach, robust expectation‐solution (RES) estimation. We compare the RES approach with an existing robust approach, minimum Hellinger distance (MHD) estimation. Simulation results indicate that both methods improve on ML when outliers are present and/or when the mixture components are poorly separated. However, the RES approach is more efficient in all the scenarios we considered. In addition, the RES method is shown to yield consistent and asymptotically normal estimators and, in contrast to MHD, can be applied quite generally.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号