首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study investigates the influences of additive outliers on financial durations. An outlier test statistic and an outlier detection procedure are proposed to detect and estimate outlier effects for the logarithmic Autoregressive Conditional Duration (Log-ACD) model. The proposed test statistic has an exact sampling distribution and performs very well, in terms of size and power, in a series of Monte Carlo simulations. Furthermore, the test statistic is robust to several alternative distribution assumptions. An empirical application shows that parameter estimates without considering outliers tend to be biased.  相似文献   

2.
Although the poor performance of the mean as a location estimate when outliers are present in the data is well-known, there has b.een no clear consensus as to whether robust estimation or outlier detection Is the appropriate corrective procedure. In this paper, the estimation accuracy of the sample mean and 27 robust estimation and outlier detection techniques are compared by computer simulation. Both symmetric and asymmetric contamination are considered, It Is shown that the proper class of estimates depends on the degree of contaminations whether the contamination is symmetric or asymmetric, and the sample size. Several data sets considered previously by Rocke et.al. (1982) are also examined.  相似文献   

3.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

4.
Efficient score tests exist among others, for testing the presence of additive and/or innovative outliers that are the result of the shifted mean of the error process under the regression model. A sample influence function of autocorrelation-based diagnostic technique also exists for the detection of outliers that are the result of the shifted autocorrelations. The later diagnostic technique is however not useful if the outlying observation does not affect the autocorrelation structure but is generated due to an inflation in the variance of the error process under the regression model. In this paper, we develop a unified maximum studentized type test which is applicable for testing the additive and innovative outliers as well as variance shifted outliers that may or may not affect the autocorrelation structure of the outlier free time series observations. Since the computation of the p-values for the maximum studentized type test is not easy in general, we propose a Satterthwaite type approximation based on suitable doubly non-central F-distributions for finding such p-values [F.E. Satterthwaite, An approximate distribution of estimates of variance components, Biometrics 2 (1946), pp. 110–114]. The approximations are evaluated through a simulation study, for example, for the detection of additive and innovative outliers as well as variance shifted outliers that do not affect the autocorrelation structure of the outlier free time series observations. Some simulation results on model misspecification effects on outlier detection are also provided.  相似文献   

5.
This paper proposes a new robust Bayes factor for comparing two linear models. The factor is based on a pseudo‐model for outliers and is more robust to outliers than the Bayes factor based on the variance‐inflation model for outliers. If an observation is considered an outlier for both models this new robust Bayes factor equals the Bayes factor calculated after removing the outlier. If an observation is considered an outlier for one model but not the other then this new robust Bayes factor equals the Bayes factor calculated without the observation, but a penalty is applied to the model considering the observation as an outlier. For moderate outliers where the variance‐inflation model is suitable, the two Bayes factors are similar. The new Bayes factor uses a single robustness parameter to describe a priori belief in the likelihood of outliers. Real and synthetic data illustrate the properties of the new robust Bayes factor and highlight the inferior properties of Bayes factors based on the variance‐inflation model for outliers.  相似文献   

6.
Parameter estimation is the first step in constructing control charts. One of these parameters is the process mean. The classical estimators of the process mean are sensitive to the presence of outlying data and subgroups which contaminate the whole data. In existing robust estimators for the process mean, the effects of the presence of the individual outliers are being considered, while, in this paper, a robust estimator is being proposed to reduce the effect of outlying subgroups as well as the individual outliers within a subgroup. The proposed estimator was compared with some classical and robust estimators of the process mean. Although, its relative efficiency is fourth among the estimators tested, its robustness and efficiency are large when the outlying subgroups are present. Evaluation of the results indicated that the proposed estimator is less sensitive to the presence of outliers and the process mean performs well when there are no individual outliers or outlying subgroups.  相似文献   

7.
Elevation in C-reactive protein (CRP) is an independent risk factor for cardiovascular disease progression and levels are reduced by treatment with statins. However, on-treatment CRP, given baseline CRP and treatment, is not normally distributed and outliers exist even when transformations are applied. Although classical non-parametric tests address some of these issues, they do not enable straightforward inclusion of covariate information. The aims of this study were to produce a model that improved efficiency and accuracy of analysis of CRP data. Estimation of treatment effects and identification of outliers were addressed using controlled trials of rosuvastatin. The robust statistical technique of MM-estimation was used to fit models to data in the presence of outliers and was compared with least-squares estimation. To develop the model, appropriate transformations of the response and baseline variables were selected. The model was used to investigate how on-treatment CRP related to baseline CRP and estimated treatment effects with rosuvastatin. On comparing least-squares and MM-estimation, MM-estimation was superior to least-squares estimation in that parameter estimates were more efficient and outliers were clearly identified. Relative reductions in CRP were higher at higher baseline CRP levels. There was also evidence of a dose-response relationship between CRP reductions from baseline and rosuvastatin. Several large outliers were identified, although there did not appear to be any relationships between the incidence of outliers and treatments. In conclusion, using robust estimation to model CRP data is superior to least-squares estimation and non-parametric tests in terms of efficiency, outlier identification and the ability to include covariate information.  相似文献   

8.
In the multiple linear regression analysis, the ridge regression estimator and the Liu estimator are often used to address multicollinearity. Besides multicollinearity, outliers are also a problem in the multiple linear regression analysis. We propose new biased estimators based on the least trimmed squares (LTS) ridge estimator and the LTS Liu estimator in the case of the presence of both outliers and multicollinearity. For this purpose, a simulation study is conducted in order to see the difference between the robust ridge estimator and the robust Liu estimator in terms of their effectiveness; the mean square error. In our simulations, the behavior of the new biased estimators is examined for types of outliers: X-space outlier, Y-space outlier, and X-and Y-space outlier. The results for a number of different illustrative cases are presented. This paper also provides the results for the robust ridge regression and robust Liu estimators based on a real-life data set combining the problem of multicollinearity and outliers.  相似文献   

9.
The stalactite plot for the detection of multivariate outliers   总被引:1,自引:0,他引:1  
Detection of multiple outliers in multivariate data using Mahalanobis distances requires robust estimates of the means and covariance of the data. We obtain this by sequential construction of an outlier free subset of the data, starting from a small random subset. The stalactite plot provides a cogent summary of suspected outliers as the subset size increases. The dependence on subset size can be virtually removed by a simulation-based normalization. Combined with probability plots and resampling procedures, the stalactite plot, particularly in its normalized form, leads to identification of multivariate outliers, even in the presence of appreciable masking.  相似文献   

10.
ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers.  相似文献   

11.
A general way of detecting multivariate outliers involves using robust depth functions, or, equivalently, the corresponding ‘outlyingness’ functions; the more outlying an observation, the more extreme (less deep) it is in the data cloud and thus potentially an outlier. Most outlier detection studies in the literature assume that the underlying distribution is multivariate normal. This paper deals with the case of multivariate skewed data, specifically when the data follow the multivariate skew-normal [1] distribution. We compare the outlier detection capabilities of four robust outlier detection methods through their outlyingness functions in a simulation study. Two scenarios are considered for the occurrence of outliers: ‘the cluster’ and ‘the radial’. Conclusions and recommendations are offered for each scenario.  相似文献   

12.
We provide a method for simultaneous variable selection and outlier identification using the mean-shift outlier model. The procedure consists of two steps: the first step is to identify potential outliers, and the second step is to perform all possible subset regressions for the mean-shift outlier model containing the potential outliers identified in step 1. This procedure is helpful for model selection while simultaneously considering outlier identification, and can be used to identify multiple outliers. In addition, we can evaluate the impact on the regression model of simultaneous omission of variables and interesting observations. In an example, we provide detailed output from the R system, and compare the results with those using posterior model probabilities as proposed by Hoeting et al. [Comput. Stat. Data Anal. 22 (1996), pp. 252-270] for simultaneous variable selection and outlier identification.  相似文献   

13.
Principal component analysis (PCA) is a popular technique that is useful for dimensionality reduction but it is affected by the presence of outliers. The outlier sensitivity of classical PCA (CPCA) has caused the development of new approaches. Effects of using estimates obtained by expectation–maximization – EM and multiple imputation – MI instead of outliers were examined on the artificial and a real data set. Furthermore, robust PCA based on minimum covariance determinant (MCD), PCA based on estimates obtained by EM instead of outliers and PCA based on estimates obtained by MI instead of outliers were compared with the results of CPCA. In this study, we tried to show the effects of using estimates obtained by MI and EM instead of outliers, depending on the ratio of outliers in data set. Finally, when the ratio of outliers exceeds 20%, we suggest the use of estimates obtained by MI and EM instead of outliers as an alternative approach.  相似文献   

14.
Some recent contributions to robust data analysis and multiple outlier detection are discussed. Two methods of analysis producing robust estimates and sets of weights which may be inspected for outliers are described and compared. Some examples of their application are given to support the recommendation that both ordinary least squares and a robust method of analysis should be part of routine data analysis.  相似文献   

15.
The influence of observations on the parameter estimates for the simple structural errors-in-variables model with no equation error, under the Student-t distribution, is investigated using the local influence approach. The main conclusion is that the Student-t model with small degrees of freedom is able to incorporate possible outliers and influential observations in the data. The likelihood displacement approach is useful for outlier detection, especially when a masking phenomenon is present and the degrees of freedom parameter is large. The diagnostics are illustrated with two examples.  相似文献   

16.
Whether an extreme observation is an outlier or not depends strongly on the corresponding tail behavior of the underlying distribution. We develop an automatic, data-driven method rooted in the mathematical theory of extremes to identify observations that deviate from the intermediate and central characteristics. The proposed algorithm is an extension of a method previously proposed in the literature for the specific case of heavy tailed Pareto-type distributions to all max-domains of attraction. We propose some applications such as a tail-adjusted boxplot which yields a more accurate representation of possible outliers, and the identification of outliers in a multivariate context through an analysis of associated random variables such as local outlier factors. Several examples and simulation results illustrate the finite sample behavior of the algorithm and its applications.  相似文献   

17.
Test procedures on outlier detection problems for Gumbel distribution are rarely available. Hence, a test statistic is proposed here for detection of a pair of upper and lower outliers from a Gumbel distribution with known scale parameter. The critical values of the statistic are obtained and some examples are also given to highlight the use of the statistic. The advantage of the proposed statistic is that the scale parameter, though assumed to be known is not explicitly involved in the determination of the critical values.  相似文献   

18.
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.  相似文献   

19.
Quantitative traits measured over pedigrees of individuals may be analysed using maximum likelihood estimation, assuming that the trait has a multivariate normal distribution. This approach is often used in the analysis of mixed linear models. In this paper a robust version of the log likelihood for multivariate normal data is used to construct M-estimators which are resistant to contamination by outliers. The robust estimators are found using a minimisation routine which retains the flexible parameterisations of the multivariate normal approach. Asymptotic properties of the estimators are derived, computation of the estimates and their use in outlier detection tests are discussed, and a small simulation study is conducted.  相似文献   

20.
The author presents a robust F-test for comparing nested linear models. It is suggested that the approach will be attractive to practitioners because it is based on the familiar F-statistic and corresponds to the common practice of reporting F-statistics after removing obvious outliers. It is calibrated in terms of a real parameter that can be directly interpreted as the willingness of the data analyst to remove observations, and the sensitivity of the F-statistic to this parameter is easily examined. The procedure is evaluated with a simulation study where a scale mixture distribution is used to generate outliers. The procedure is also applied to some data where the occurrence of an outlier is confounded with the significance of a regression term. This provides a comparison of two competing models for the data: one removing an outlier and the other including an additional regression term instead.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号