首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 806 毫秒
1.
The paper compares six smoothers, in terms of mean squared error and bias, when there are multiple predictors and the sample size is relatively small. The focus is on methods that use robust measures of location (primarily a 20% trimmed mean) and where there are four predictors. To add perspective, some methods designed for means are included. The smoothers include the locally weighted (loess) method derived by Cleveland and Devlin [W.S. Cleveland, S.J. Devlin, Locally-weighted regression: an approach to regression analysis by local fitting, Journal of the American Statistical Association 83 (1988) 596–610], a variation of a so-called running interval smoother where distances from a point are measured via a particular group of projections of the data, a running interval smoother where distances are measured based in part using the minimum volume ellipsoid estimator, a generalized additive model based on the running interval smoother, a generalized additive model based on the robust version of the smooth derived by Cleveland [W.S. Cleveland, Robust locally weighted regression and smoothing scatterplots, Journal of the American Statistical Association 74 (1979) 829–836], and a kernel regression method stemming from [J. Fan, Local linear smoothers and their minimax efficiencies, The Annals of Statistics 21 (1993) 196–216]. When the regression surface is a plane, the method stemming from [J. Fan, Local linear smoothers and their minimax efficiencies, The Annals of Statistics 21 (1993) 196–216] was found to dominate, and indeed offers a substantial advantage in various situations, even when the error term has a heavy-tailed distribution. However, if there is curvature, this method can perform poorly compared to the other smooths considered. Now the projection-type smoother used in conjunction with a 20% trimmed mean is recommended with the minimum volume ellipsoid method a close second.  相似文献   

2.
In many clinical studies where time to failure is of primary interest, patients may fail or die from one of many causes where failure time can be right censored. In some circumstances, it might also be the case that patients are known to die but the cause of death information is not available for some patients. Under the assumption that cause of death is missing at random, we compare the Goetghebeur and Ryan (1995, Biometrika, 82, 821–833) partial likelihood approach with the Dewanji (1992, Biometrika, 79, 855–857)partial likelihood approach. We show that the estimator for the regression coefficients based on the Dewanji partial likelihood is not only consistent and asymptotically normal, but also semiparametric efficient. While the Goetghebeur and Ryan estimator is more robust than the Dewanji partial likelihood estimator against misspecification of proportional baseline hazards, the Dewanji partial likelihood estimator allows the probability of missing cause of failure to depend on covariate information without the need to model the missingness mechanism. Tests for proportional baseline hazards are also suggested and a robust variance estimator is derived.  相似文献   

3.
The first step in statistical analysis is the parameter estimation. In multivariate analysis, one of the parameters of interest to be estimated is the mean vector. In multivariate statistical analysis, it is usually assumed that the data come from a multivariate normal distribution. In this situation, the maximum likelihood estimator (MLE), that is, the sample mean vector, is the best estimator. However, when outliers exist in the data, the use of sample mean vector will result in poor estimation. So, other estimators which are robust to the existence of outliers should be used. The most popular robust multivariate estimator for estimating the mean vector is S-estimator with desirable properties. However, computing this estimator requires the use of a robust estimate of mean vector as a starting point. Usually minimum volume ellipsoid (MVE) is used as a starting point in computing S-estimator. For high-dimensional data computing, the MVE takes too much time. In some cases, this time is so large that the existing computers cannot perform the computation. In addition to the computation time, for high-dimensional data set the MVE method is not precise. In this paper, a robust starting point for S-estimator based on robust clustering is proposed which could be used for estimating the mean vector of the high-dimensional data. The performance of the proposed estimator in the presence of outliers is studied and the results indicate that the proposed estimator performs precisely and much better than some of the existing robust estimators for high-dimensional data.  相似文献   

4.

Cressie et al. (2000; 2003) introduced and studied a new family of statistics, based on the φ-divergence measure, for solving the problem of testing a nested sequence of loglinear models. In that family of test statistics the parameters are estimated using the minimum φ-divergence estimator which is a generalization of the maximum likelihood estimator. In this paper we study the minimum power-divergence estimator (the most important family of minimum φ-divergence estimator) for a nested sequence of loglinear models in three-way contingency tables under assumptions of multinomial sampling. A simulation study illustrates that the minimum chi-squared estimator is simultaneously the most robust and efficient estimator among the family of the minimum power-divergence estimator.  相似文献   

5.
In this paper, we study the robustness properties of several procedures for the joint estimation of shape and scale in a generalized Pareto model. The estimators that we primarily focus upon, most bias robust estimator (MBRE) and optimal MSE-robust estimator (OMSE), are one-step estimators distinguished as optimally robust in the shrinking neighbourhood setting; that is, they minimize the maximal bias, respectively, on such a specific neighbourhood, the maximal mean squared error (MSE). For their initialization, we propose a particular location–dispersion estimator, MedkMAD, which matches the population median and kMAD (an asymmetric variant of the median of absolute deviations) against the empirical counterparts. These optimally robust estimators are compared to the maximum-likelihood, skipped maximum-likelihood, Cramér–von-Mises minimum distance, method-of-medians, and Pickands estimators. To quantify their deviation from robust optimality, for each of these suboptimal estimators, we determine the finite-sample breakdown point and the influence function, as well as the statistical accuracy measured by asymptotic bias, variance, and MSE – all evaluated uniformly on shrinking neighbourhoods. These asymptotic findings are complemented by an extensive simulation study to assess the finite-sample behaviour of the considered procedures. The applicability of the procedures and their stability against outliers are illustrated for the Danish fire insurance data set from the package evir.  相似文献   

6.
A robust estimator is developed for Poisson mixture models with a known number of components. The proposed estimator minimizes the L2 distance between a sample of data and the model. When the component distributions are completely known, the estimators for the mixing proportions are in closed form. When the parameters for the component Poisson distributions are unknown, numerical methods are needed to calculate the estimators. Compared to the minimum Hellinger distance estimator, the minimum L2 estimator can be less robust to extreme outliers, and often more robust to moderate outliers.  相似文献   

7.
A robust estimator introduced by Beran (1977a, 1977b), which is based on the minimum Hellinger distance between a projection model density and a nonparametric sample density, is studied empirically. An extensive simulation provides an estimate of the small sample distribution and supplies empirical evidence of the estimator performance for a normal location-scale model. While the performance of the minimum Hellinger distance estimator is seen to be competitive with the maximum likelihood estimator at the true model, its robustness to deviations from normality is shown to be competitive in this setting with that obtained from the M-estimator and the Cramér-von Mises minimum distance estimator. Beran also introduced a goodness-of-fit statisticH 2, based on the minimized Hellinger distance between a member of a parametric family of densities and a nonparametric density estimate. We investigate the statistic H (the square root of H 2) as a test for normality when both location and scale are unspecified. Empirically derived critical values are given which do not require extensive tables. The power of the statistic H compares favorably with four other widely used tests for normality.  相似文献   

8.
A new approach to form multivariate difference estimator is suggested which does not require the knowledge of unknown population parameters as such. It gives minimum variance among the class of multivariate difference estimators. The performance of this estimator with respect to Des Raj's (J. Amer. Statist. Assoc. 60 (1965), 270–277) multivariate difference estimator is illustrated. Using the information on two auxiliary variates, the robustness of Des Raj's estimator yd is studied empirically. Two new estimators to estimate population mean/total are developed on the same lines as that of yd. The performance of these estimators is studied for a wide variety of populations.  相似文献   

9.
Control charts are one of the widest used techniques in statistical process control. In Phase I, historical observations are analysed in order to construct a control chart. Because of the existence of multiple outliers that are undetected by control charts such as Hotelling’s T 2 due to the masking effect, robust alternatives to Hotelling’s T 2 have been developed based on minimum volume ellipsoid (MVE) estimators, minimum covariance determinant (MCD) estimators, reweighted MCD estimators or trimmed estimators. In this paper, we use a simulation study to analyse the performance of each alternative in various situations and offer guidance for the correct use of each estimator.  相似文献   

10.
11.
In this paper, we consider a regression model and propose estimators which are the weighted averages of two estimators among three estimators; the Stein-rule (SR), the minimum mean squared error (MMSE), and the adjusted minimum mean-squared error (AMMSE) estimators. It is shown that one of the proposed estimators has smaller mean-squared error (MSE) than the positive-part Stein-rule (PSR) estimator over a moderate region of parameter space when the number of the regression coefficients is small (i.e., 3), and its MSE performance is comparable to the PSR estimator even when the number of the regression coefficients is not so small.  相似文献   

12.
Numerous estimation techniques for regression models have been proposed. These procedures differ in how sample information is used in the estimation procedure. The efficiency of least squares (OLS) estimators implicity assumes normally distributed residuals and is very sensitive to departures from normality, particularly to "outliers" and thick-tailed distributions. Lead absolute deviation (LAD) estimators are less sensitive to outliers and are optimal for laplace random disturbances, but not for normal errors. This paper reports monte carlo comparisons of OLS,LAD, two robust estimators discussed by huber, three partially adaptiveestimators, newey's generalized method of moments estimator, and an adaptive maximum likelihood estimator based on a normal kernal studied by manski. This paper is the first to compare the relative performance of some adaptive robust estimators (partially adaptive and adaptive procedures) with some common nonadaptive robust estimators. The partially adaptive estimators are based on three flxible parametric distributions for the errors. These include the power exponential (Box-Tiao) and generalized t distributions, as well as a distribution for the errors, which is not necessarily symmetric. The adaptive procedures are "fully iterative" rather than one step estimators. The adaptive estimators have desirable large sample properties, but these properties do not necessarily carry over to the small sample case.

The monte carlo comparisons of the alternative estimators are based on four different specifications for the error distribution: a normal, a mixture of normals (or variance-contaminated normal), a bimodal mixture of normals, and a lognormal. Five hundred samples of 50 are used. The adaptive and partially adaptive estimators perform very well relative to the other estimation procedures considered, and preliminary results suggest that in some important cases they can perform much better than OLS with 50 to 80% reductions in standard errors.

  相似文献   

13.
In this paper, we describe an overall strategy for robust estimation of multivariate location and shape, and the consequent identification of outliers and leverage points. Parts of this strategy have been described in a series of previous papers (Rocke, Ann. Statist., in press; Rocke and Woodruff, Statist. Neerlandica 47 (1993), 27–42, J. Amer. Statist. Assoc., in press; Woodruff and Rocke, J. Comput. Graphical Statist. 2 (1993), 69–95; J. Amer. Statist. Assoc. 89 (1994), 888–896) but the overall structure is presented here for the first time. After describing the first-level architecture of a class of algorithms for this problem, we review available information about possible tactics for each major step in the process. The major steps that we have found to be necessary are as follows: (1) partition the data into groups of perhaps five times the dimension; (2) for each group, search for the best available solution to a combinatorial estimator such as the Minimum Covariance Determinant (MCD) — these are the preliminary estimates; (3) for each preliminary estimate, iterate to the solution of a smooth estimator chosen for robustness and outlier resistance; and (4) choose among the final iterates based on a robust criterion, such as minimum volume. Use of this algorithm architecture can enable reliable, fast, robust estimation of heavily contaminated multivariate data in high (> 20) dimension even with large quantities of data. A computer program implementing the algorithm is available from the authors.  相似文献   

14.
Abstract

This paper compares three estimators for periodic autoregressive (PAR) models. The first is the classical periodic Yule-Walker estimator (YWE). The second is a robust version of YWE (RYWE) which uses the robust autocovariance function in the periodic Yule-Walker equations, and the third is the robust least squares estimator (RLSE) based on iterative least squares with robust versions of the original time series. The daily mean particulate matter concentration (PM10) data is used to illustrate the methodologies in a real application, that is, in the Air Quality area.  相似文献   

15.
In this paper we consider semiparametric inference methods for the time scale parameters in general time scale models (Oakes, 1995, Duchesne and Lawless, 2000). We use the results of Robins and Tsiatis (1992) and Lin and Ying (1995) to derive a rank-based estimator that is more efficient and robust than the traditional minimum coefficient of variation (min CV) estimator of Kordonsky and Gerstbakh (1993) for many underlying models. Moreover, our estimator can readily handle censored samples, which is not the case with the min CV method.  相似文献   

16.
The extreme value theory is very popular in applied sciences including finance, economics, hydrology and many other disciplines. In univariate extreme value theory, we model the data by a suitable distribution from the general max-domain of attraction characterized by its tail index; there are three broad classes of tails—the Pareto type, the Weibull type and the Gumbel type. The simplest and most common estimator of the tail index is the Hill estimator that works only for Pareto type tails and has a high bias; it is also highly non-robust in presence of outliers with respect to the assumed model. There have been some recent attempts to produce asymptotically unbiased or robust alternative to the Hill estimator; however all the robust alternatives work for any one type of tail. This paper proposes a new general estimator of the tail index that is both robust and has smaller bias under all the three tail types compared to the existing robust estimators. This essentially produces a robust generalization of the estimator proposed by Matthys and Beirlant (Stat Sin 13:853–880, 2003) under the same model approximation through a suitable exponential regression framework using the density power divergence. The robustness properties of the estimator are derived in the paper along with an extensive simulation study. A method for bias correction is also proposed with application to some real data examples.  相似文献   

17.
In this paper, we show a sufficient condition for an operational variant of the minimum mean squared error estimator (simply, the minimum MSE estimator) to dominate the ordinary least squares (OLS) estimator. It is also shown numerically that the minimum MSE estimator dominates the OLS estimator if the number of regression coefficients is larger than or equal to three, even if the sufficient condition is not satisfied. When the number of regression coefficients is smaller than three, our numerical results show that the gain in MSE of using the minimum MSE estimator is larger than the loss.  相似文献   

18.
Numerous estimation techniques for regression models have been proposed. These procedures differ in how sample information is used in the estimation procedure. The efficiency of least squares (OLS) estimators implicity assumes normally distributed residuals and is very sensitive to departures from normality, particularly to "outliers" and thick-tailed distributions. Lead absolute deviation (LAD) estimators are less sensitive to outliers and are optimal for laplace random disturbances, but not for normal errors. This paper reports monte carlo comparisons of OLS,LAD, two robust estimators discussed by huber, three partially adaptiveestimators, newey's generalized method of moments estimator, and an adaptive maximum likelihood estimator based on a normal kernal studied by manski. This paper is the first to compare the relative performance of some adaptive robust estimators (partially adaptive and adaptive procedures) with some common nonadaptive robust estimators. The partially adaptive estimators are based on three flxible parametric distributions for the errors. These include the power exponential (Box-Tiao) and generalized t distributions, as well as a distribution for the errors, which is not necessarily symmetric. The adaptive procedures are "fully iterative" rather than one step estimators. The adaptive estimators have desirable large sample properties, but these properties do not necessarily carry over to the small sample case.

The monte carlo comparisons of the alternative estimators are based on four different specifications for the error distribution: a normal, a mixture of normals (or variance-contaminated normal), a bimodal mixture of normals, and a lognormal. Five hundred samples of 50 are used. The adaptive and partially adaptive estimators perform very well relative to the other estimation procedures considered, and preliminary results suggest that in some important cases they can perform much better than OLS with 50 to 80% reductions in standard errors.  相似文献   

19.
A reasonable approach to robust regression estimation is minimizing a robust scale estimator of the pairwise differences of residuals. We introduce a large class of estimators based on this strategy extending ideas of Yohai and Zamar (Am. Statist. (1993) 1824–1842) and Croux et al. (J. Am. Statist. Assoc. (1994) 1271–1281). The asymptotic robustness properties of the estimators in this class are addressed using the maxbias curve. We provide a general principle to compute this curve and present explicit formulae for several particular cases including generalized versions of S-, R- and τ-estimators. Finally, the most stable estimator in the class, that is, the estimator with the minimum maxbias curve, is shown to be the set of coefficients that minimizes an appropriate quantile of the distribution of the absolute pairwise differences of residuals.  相似文献   

20.
Summary: L p –norm weighted depth functions are introduced and the local and global robustness of these weighted L p –depth functions and their induced multivariate medians are investigated via influence function and finite sample breakdown point. To study the global robustness of depth functions, a notion of finite sample breakdown point is introduced. The weighted L p –depth functions turn out to have the same low breakdown point as some other popular depth functions. Their influence functions are also unbounded. On the other hand, the weighted L p –depth induced medians are globally robust with the highest possible breakdown point for any reasonable estimator. The weighted L p –medians are also locally robust with bounded influence functions for suitable weight functions. Unlike other existing depth functions and multivariate medians, the weighted L p depth and medians are easy to calculate in high dimensions. The price for this advantage is the lack of affine invariance and equivariance of the weighted L p depth and medians, respectively.*The author thanks the referees for their very insightful and constructive comments and suggestions which led to corrections and substantial improvements. Supported in part by NSF Grants DMS-0071976 and DMS-0134628.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号