首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
When there is an outlier in the data set, the efficiency of traditional methods decreases. In order to solve this problem, Kadilar et al. (2007) adapted Huber-M method which is only one of robust regression methods to ratio-type estimators and decreased the effect of outlier problem. In this study, new ratio-type estimators are proposed by considering Tukey-M, Hampel M, Huber MM, LTS, LMS and LAD robust methods based on the Kadilar et al. (2007). Theoretically, we obtain the mean square error (MSE) for these estimators. We compared with MSE values of proposed estimators and MSE values of estimators based on Huber-M and OLS methods. As a result of these comparisons, we observed that our proposed estimators give more efficient results than both Huber M approach which was proposed by Kadilar et al. (2007) and OLS approach. Also, under all conditions, all of the other proposed estimators except Lad method are more efficient than robust estimators proposed by Kadilar et al. (2007). And, these theoretical results are supported with the aid of a numerical example and simulation by basing on data that includes an outlier.  相似文献   

2.
The Zero-inflated Poisson distribution has been used in the modeling of count data in different contexts. This model tends to be influenced by outliers because of the excessive occurrence of zeroes, thus outlier identification and robust parameter estimation are important for such distribution. Some outlier identification methods are studied in this paper, and their applications and results are also presented with an example. To eliminate the effect of outliers, two robust parameter estimates are proposed based on the trimmed mean and the Winsorized mean. Simulation results show the robustness of our proposed parameter estimates.  相似文献   

3.
We propose a new robust regression estimator using data partition technique and M estimation (DPM). The data partition technique is designed to define a small fixed number of subsets of the partitioned data set and to produce corresponding ordinary least square (OLS) fits in each subset, contrary to the resampling technique of existing robust estimators such as the least trimmed squares estimator. The proposed estimator shares a common strategy with the median ball algorithm estimator that is obtained from the OLS trial fits only on a fixed number of subsets of the data. We examine performance of the DPM estimator in the eleven challenging data sets and simulation studies. We also compare the DPM with the five commonly used robust estimators using empirical convergence rates relative to the OLS for clean data, robustness through mean squared error and bias, masking and swamping probabilities, the ability of detecting the known outliers, and the regression and affine equivariances.  相似文献   

4.
Ordinary least-square (OLS) estimators for a linear model are very sensitive to unusual values in the design space or outliers among y values. Even one single atypical value may have a large effect on the parameter estimates. This article aims to review and describe some available and popular robust techniques, including some recent developed ones, and compare them in terms of breakdown point and efficiency. In addition, we also use a simulation study and a real data application to compare the performance of existing robust methods under different scenarios.  相似文献   

5.
Response surfaces express the behavior of responses and can be used for both single and multi-response problems. A common approach to estimate a response surface using experimental results is the ordinary least squares (OLS) method. Since OLS is very sensitive to outliers, some robust approaches have been discussed in the literature. Although there are many methods available in the literature for multiple response optimizations, there are a few studies in model building especially robust models. Assuming correlated responses, in this paper, a robust coefficient estimation method is proposed for multi response problem based on M-estimators. In order to illustrate the performance of the proposed procedure, a contaminated experimental design using a numerical example available in the literature with some modifications is used. Both the classical multivariate least squares method and the proposed robust multivariate approach are used to estimate regression coefficients of multi-response surfaces based on this example. Moreover, a comparison of the proposed robust multi response surface (RMRS) approach with separate robust estimation of single response show that the proposed approach is more efficient.  相似文献   

6.
It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation (MI) methods tend to impute nonextreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depth-based multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of MI as well as a new proposed one, nine in all, in a setting of several actual clinical laboratory data sets of different dimensions. Two criteria, an ‘outlier recovery probability’ and a ‘relative accuracy measure’, are developed, based on depth functions. Three outlier identifiers, based on Mahalanobis distance, robust Mahalanobis distance, and generalized principle component analysis are also included in the study. Consequently, not only the comparison of imputation methods but also the comparison of outlier detection methods is accomplished in this study. Our findings show that the performance of an MI method depends on the choice of depth-based outlier detection criterion, as well as the size and dimension of the data and the fraction of missing components. By taking these features into account, an MI method for a given data set can be selected more optimally.  相似文献   

7.
Multivariate mixture regression models can be used to investigate the relationships between two or more response variables and a set of predictor variables by taking into consideration unobserved population heterogeneity. It is common to take multivariate normal distributions as mixing components, but this mixing model is sensitive to heavy-tailed errors and outliers. Although normal mixture models can approximate any distribution in principle, the number of components needed to account for heavy-tailed distributions can be very large. Mixture regression models based on the multivariate t distributions can be considered as a robust alternative approach. Missing data are inevitable in many situations and parameter estimates could be biased if the missing values are not handled properly. In this paper, we propose a multivariate t mixture regression model with missing information to model heterogeneity in regression function in the presence of outliers and missing values. Along with the robust parameter estimation, our proposed method can be used for (i) visualization of the partial correlation between response variables across latent classes and heterogeneous regressions, and (ii) outlier detection and robust clustering even under the presence of missing values. We also propose a multivariate t mixture regression model using MM-estimation with missing information that is robust to high-leverage outliers. The proposed methodologies are illustrated through simulation studies and real data analysis.  相似文献   

8.
A criterion for robust estimation of location and covariance matrix is considered, and its application in outlier labeling is discussed. This method, unlike the methods based on MVE and MCD, is applicable to large and high-dimension data sets. The method proposed here is also robust and has the same breakdown point as the MVE- and MCD-based methods. Furthermore, the computational complexity of the proposed method is significantly smaller than that of other methods.  相似文献   

9.
Control charts for residuals, based on the regression model, require a robust fitting technique for minimizing the error resulting from the fitted model. However, in the multivariate case, when the number of variables is high and data become complex, traditional fitting techniques, such as ordinary least squares (OLS), lose efficiency. In this paper, support vector regression (SVR) is used to construct robust control charts for residuals, called SVR-chart. This choice is based on the fact that the SVR is designed to minimize the structural error whereas other techniques minimize the empirical error. An application shows that SVR methods gives competitive results in comparison with the OLS and the partial least squares method, in terms of standard deviation of the error prediction and the standard error of performance. A sensitivity study is conducted to evaluate the SVR-chart performance based on the average run length (ARL) and showed that the SVR-chart has the best ARL behaviour in comparison with the other residuals control charts.  相似文献   

10.
ABSTRACT

Advances in statistical computing software have led to a substantial increase in the use of ordinary least squares (OLS) regression models in the engineering and applied statistics communities. Empirical evidence suggests that data sets can routinely have 10% or more outliers in many processes. Unfortunately, these outliers typically will render the OLS parameter estimates useless. The OLS diagnostic quantities and graphical plots can reliably identify a few outliers; however, they significantly lose power with increasing dimension and number of outliers. Although there have been recent advances in the methods that detect multiple outliers, improvements are needed in regression estimators that can fit well in the presence of outliers. We introduce a robust regression estimator that performs well regardless of outlier quantity and configuration. Our studies show that the best available estimators are vulnerable when the outliers are extreme in the regressor space (high leverage). Our proposed compound estimator modifies recently published methods with an improved initial estimate and measure of leverage. Extensive performance evaluations indicate that the proposed estimator performs the best and consistently fits the bulk of the data when outliers are present. The estimator, implemented in standard software, provides researchers and practitioners a tool for the model-building process to protect against the severe impact from multiple outliers.  相似文献   

11.
This paper proposes a new robust Bayes factor for comparing two linear models. The factor is based on a pseudo‐model for outliers and is more robust to outliers than the Bayes factor based on the variance‐inflation model for outliers. If an observation is considered an outlier for both models this new robust Bayes factor equals the Bayes factor calculated after removing the outlier. If an observation is considered an outlier for one model but not the other then this new robust Bayes factor equals the Bayes factor calculated without the observation, but a penalty is applied to the model considering the observation as an outlier. For moderate outliers where the variance‐inflation model is suitable, the two Bayes factors are similar. The new Bayes factor uses a single robustness parameter to describe a priori belief in the likelihood of outliers. Real and synthetic data illustrate the properties of the new robust Bayes factor and highlight the inferior properties of Bayes factors based on the variance‐inflation model for outliers.  相似文献   

12.
Generalized linear mixed models (GLMMs) are widely used to analyse non-normal response data with extra-variation, but non-robust estimators are still routinely used. We propose robust methods for maximum quasi-likelihood and residual maximum quasi-likelihood estimation to limit the influence of outlying observations in GLMMs. The estimation procedure parallels the development of robust estimation methods in linear mixed models, but with adjustments in the dependent variable and the variance component. The methods proposed are applied to three data sets and a comparison is made with the nonparametric maximum likelihood approach. When applied to a set of epileptic seizure data, the methods proposed have the desired effect of limiting the influence of outlying observations on the parameter estimates. Simulation shows that one of the residual maximum quasi-likelihood proposals has a smaller bias than those of the other estimation methods. We further discuss the equivalence of two GLMM formulations when the response variable follows an exponential family. Their extensions to robust GLMMs and their comparative advantages in modelling are described. Some possible modifications of the robust GLMM estimation methods are given to provide further flexibility for applying the method.  相似文献   

13.
Modern systems of official statistics require the estimation and publication of business statistics for disaggregated domains, for example, industry domains and geographical regions. Outlier robust methods have proven to be useful for small‐area estimation. Recently proposed outlier robust model‐based small‐area methods assume, however, uncorrelated random effects. Spatial dependencies, resulting from similar industry domains or geographic regions, often occur. In this paper, we propose an outlier robust small‐area methodology that allows for the presence of spatial correlation in the data. In particular, we present a robust predictive methodology that incorporates the potential spatial impact from other areas (domains) on the small area (domain) of interest. We further propose two parametric bootstrap methods for estimating the mean‐squared error. Simulations indicate that the proposed methodology may lead to efficiency gains. The paper concludes with an illustrative application by using business data for estimating average labour costs in Italian provinces.  相似文献   

14.
As the ordinary least squares (OLS) method is very sensitive to outliers as well as to correlated responses, a robust coefficient estimation method is proposed in this paper for multi-response surfaces in multistage processes based on M-estimators. In this approach, experimental designs are used in which the intermediate response variables may act as covariates in the next stages. The performances of both the ordinary multivariate OLS and the proposed robust multi-response surface approach are analyzed and compared through extensive simulation experiments. Sum of the squared errors in estimating the regression coefficients reveals the efficiency of the proposed robust approach.  相似文献   

15.
16.
Numerous estimation techniques for regression models have been proposed. These procedures differ in how sample information is used in the estimation procedure. The efficiency of least squares (OLS) estimators implicity assumes normally distributed residuals and is very sensitive to departures from normality, particularly to "outliers" and thick-tailed distributions. Lead absolute deviation (LAD) estimators are less sensitive to outliers and are optimal for laplace random disturbances, but not for normal errors. This paper reports monte carlo comparisons of OLS,LAD, two robust estimators discussed by huber, three partially adaptiveestimators, newey's generalized method of moments estimator, and an adaptive maximum likelihood estimator based on a normal kernal studied by manski. This paper is the first to compare the relative performance of some adaptive robust estimators (partially adaptive and adaptive procedures) with some common nonadaptive robust estimators. The partially adaptive estimators are based on three flxible parametric distributions for the errors. These include the power exponential (Box-Tiao) and generalized t distributions, as well as a distribution for the errors, which is not necessarily symmetric. The adaptive procedures are "fully iterative" rather than one step estimators. The adaptive estimators have desirable large sample properties, but these properties do not necessarily carry over to the small sample case.

The monte carlo comparisons of the alternative estimators are based on four different specifications for the error distribution: a normal, a mixture of normals (or variance-contaminated normal), a bimodal mixture of normals, and a lognormal. Five hundred samples of 50 are used. The adaptive and partially adaptive estimators perform very well relative to the other estimation procedures considered, and preliminary results suggest that in some important cases they can perform much better than OLS with 50 to 80% reductions in standard errors.

  相似文献   

17.
This paper studies the outlier detection and robust variable selection problem in the linear regression model. The penalized weighted least absolute deviation (PWLAD) regression estimation method and the adaptive least absolute shrinkage and selection operator (LASSO) are combined to simultaneously achieve outlier detection, and robust variable selection. An iterative algorithm is proposed to solve the proposed optimization problem. Monte Carlo studies are evaluated the finite-sample performance of the proposed methods. The results indicate that the finite sample performance of the proposed methods performs better than that of the existing methods when there are leverage points or outliers in the response variable or explanatory variables. Finally, we apply the proposed methodology to analyze two real datasets.  相似文献   

18.
Numerous estimation techniques for regression models have been proposed. These procedures differ in how sample information is used in the estimation procedure. The efficiency of least squares (OLS) estimators implicity assumes normally distributed residuals and is very sensitive to departures from normality, particularly to "outliers" and thick-tailed distributions. Lead absolute deviation (LAD) estimators are less sensitive to outliers and are optimal for laplace random disturbances, but not for normal errors. This paper reports monte carlo comparisons of OLS,LAD, two robust estimators discussed by huber, three partially adaptiveestimators, newey's generalized method of moments estimator, and an adaptive maximum likelihood estimator based on a normal kernal studied by manski. This paper is the first to compare the relative performance of some adaptive robust estimators (partially adaptive and adaptive procedures) with some common nonadaptive robust estimators. The partially adaptive estimators are based on three flxible parametric distributions for the errors. These include the power exponential (Box-Tiao) and generalized t distributions, as well as a distribution for the errors, which is not necessarily symmetric. The adaptive procedures are "fully iterative" rather than one step estimators. The adaptive estimators have desirable large sample properties, but these properties do not necessarily carry over to the small sample case.

The monte carlo comparisons of the alternative estimators are based on four different specifications for the error distribution: a normal, a mixture of normals (or variance-contaminated normal), a bimodal mixture of normals, and a lognormal. Five hundred samples of 50 are used. The adaptive and partially adaptive estimators perform very well relative to the other estimation procedures considered, and preliminary results suggest that in some important cases they can perform much better than OLS with 50 to 80% reductions in standard errors.  相似文献   

19.
Although the poor performance of the mean as a location estimate when outliers are present in the data is well-known, there has b.een no clear consensus as to whether robust estimation or outlier detection Is the appropriate corrective procedure. In this paper, the estimation accuracy of the sample mean and 27 robust estimation and outlier detection techniques are compared by computer simulation. Both symmetric and asymmetric contamination are considered, It Is shown that the proper class of estimates depends on the degree of contaminations whether the contamination is symmetric or asymmetric, and the sample size. Several data sets considered previously by Rocke et.al. (1982) are also examined.  相似文献   

20.
A method for robust estimation and multiple outlier detection in time series generated by autoregressive integrated moving average processes in industrial environments is developed. The procedure is based on reweighted maximum likelihood estimation using Huber or redescending weights and, therefore, generalizes the well-established robust M -estimation procedures used in the regression framework. When the scalar process is non-stationary, the computations required can be performed equally well using either rhe original undifferenced series or auxiliary differenced series. Whereas the latter alternative may be preferred for scalar series, the former might be extended to cope with vector partially non-stationary time series without differencing the series, thus avoiding non-invertibility and parameter identifiability problems caused by overdifferencing. The overall strategy is applied in two real industrial data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号