首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In extending univariate outlier detection methods to higher dimension, various issues arise: limited visualization methods, inadequacy of marginal methods, lack of a natural order, limited parametric modeling, and, when using Mahalanobis distance, restriction to ellipsoidal contours. To address and overcome such limitations, we introduce nonparametric multivariate outlier identifiers based on multivariate depth functions, which can generate contours following the shape of the data set. Also, we study masking robustness, that is, robustness against misidentification of outliers as nonoutliers. In particular, we define a masking breakdown point (MBP), adapting to our setting certain ideas of Davies and Gather [1993. The identification of multiple outliers (with discussion). Journal of the American Statistical Association 88, 782–801] and Becker and Gather [1999. The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association 94, 947–955] based on the Mahalanobis distance outlyingness. We then compare four affine invariant outlier detection procedures, based on Mahalanobis distance, halfspace or Tukey depth, projection depth, and “Mahalanobis spatial” depth. For the goal of threshold type outlier detection, it is found that the Mahalanobis distance and projection procedures are distinctly superior in performance, each with very high MBP, while the halfspace approach is quite inferior. When a moderate MBP suffices, the Mahalanobis spatial procedure is competitive in view of its contours not constrained to be elliptical and its computational burden relatively mild. A small sampling experiment yields findings completely in accordance with the theoretical comparisons. While these four depth procedures are relatively comparable for the purpose of robust affine equivariant location estimation, the halfspace depth is not competitive with the others for the quite different goal of robust setting of an outlyingness threshold.  相似文献   

2.
A leading multivariate extension of the univariate quantiles is the so-called “spatial” or “geometric” notion, for which sample versions are highly robust and conveniently satisfy a Bahadur–Kiefer representation. Another extension of univariate quantiles has been to univariate U-quantiles, on the basis of which, for example, the well-known Hodges–Lehmann location estimator has a natural formulation. Generalizing both extensions, we introduce multivariate spatial U-quantiles and develop a corresponding Bahadur–Kiefer representation. New statistics based on spatial U-quantiles are presented for nonparametric estimation of multiple regression coefficients, extending the classical Theil–Sen nonparametric simple linear regression slope estimator, and for robust estimation of multivariate dispersion. Some other applications are mentioned as well.  相似文献   

3.
Suppose independent random samples are available from two normal populations with a common mean and unequal variances. Estimation of a quantile of the first population is considered with respect to the quadratic loss. Some new estimators for the quantile are proposed using some previously known estimators of a common mean. Inadmissibility results are proved for estimators which are equivariant under affine and location groups of transformations. Risk values of various estimators of a quantile are compared numerically using a detailed simulation study.  相似文献   

4.
Visuri et al. (2000 Visuri, S., Koivunen, V., Oja, H. (2000). Sign and rank covariance matrices. J. Stat. Plann. Inference 91:557575.[Crossref], [Web of Science ®] [Google Scholar]) proposed a technique for robust covariance matrix estimation based on different notions of multivariate sign and rank. Among them, the spatial rank based covariance matrix estimator that utilizes a robust scale estimator is especially appealing due to its high robustness, computational ease, and good efficiency. Also, it is orthogonally equivariant under any distribution and affinely equivariant under elliptically symmetric distributions. In this paper, we study robustness properties of the estimator with respective to two measures: breakdown point and influence function. More specifically, the upper bound of the finite sample breakdown point can be achieved by a proper choice of univariate robust scale estimator. The influence functions for eigenvalues and eigenvectors of the estimator are derived. They are found to be bounded under some assumptions. Moreover, finite sample efficiency comparisons to popular robust MCD, M, and S estimators are reported.  相似文献   

5.
This paper extends the concept of risk unbiasedness for applying to statistical prediction and nonstandard inference problems, by formalizing the idea that a risk unbiased predictor should be at least as close to the “true” predictant as to any “wrong” predictant, on the average. A novel aspect of our approach is measuring closeness between a predicted value and the predictant by a regret function, derived suitably from the given loss function. The general concept is more relevant than mean unbiasedness, especially for asymmetric loss functions. For squared error loss, we present a method for deriving best (minimum risk) risk unbiased predictors when the regression function is linear in a function of the parameters. We derive a Rao–Blackwell type result for a class of loss functions that includes squared error and LINEX losses as special cases. For location-scale families, we prove that if a unique best risk unbiased predictor exists, then it is equivariant. The concepts and results are illustrated with several examples. One interesting finding is that in some problems a best unbiased predictor does not exist, but a best risk unbiased predictor can be obtained. Thus, risk unbiasedness can be a useful tool for selecting a predictor.  相似文献   

6.
An affine equivariant estimate of multivariate location based on an adaptive transformation and retransformation approach is studied. The work is primarily motivated by earlier work on different versions of the multivariate median and their properties. We explore an issue related to efficiency and equivariance that was originally raised by Bickel and subsequently investigated by Brown and Hettmansperger. Our estimate has better asymptotic performance than the vector of co-ordinatewise medians when the variables are substantially correlated. The finite sample performance of the estimate is investigated by using Monte Carlo simulations. Some examples are presented to demonstrate the effect of the adaptive transformation–retransformation strategy in the construction of multivariate location estimates for real data.  相似文献   

7.
Outlier detection is a major topic in robust statistics due to the high practical significance of anomalous observations. Many existing methods, however, either are parametric or cease to perform well when the data are far from linearly structured. In this paper, we propose a quantity, Delaunay outlyingness, that is a nonparametric outlyingness score applicable to data with complicated structure. The approach is based on a well‐known triangulation of the sample, which seems to reflect the sparsity of the pointset to different directions in a useful way. We derive results on the asymptotic behavior of Delaunay outlyingness in case of a sufficiently simple set of observations. Simulations and an application to empirical data are also discussed.  相似文献   

8.
The standard approach in change-point theory is to base the statistical analysis on a sample of fixed size. Alternatively, one observes some random phenomenon sequentially and takes action as soon as one observes some statistically significant deviation from the “normal” behaviour. The present paper is a continuation of Gut and Steinebach [2002. Truncated sequential change-point detection based on renewal counting processes. Scand. J. Statist. 29, 693–719] the main point being that here we look in more detail into the behaviour of the relevant stopping times, in particular the time it takes from the actual change-point until the change is detected, more precisely, we prove asymptotics for stopping times under alternatives.  相似文献   

9.
In this paper, we propose a robust statistical inference approach for the varying coefficient partially nonlinear models based on quantile regression. A three-stage estimation procedure is developed to estimate the parameter and coefficient functions involved in the model. Under some mild regularity conditions, the asymptotic properties of the resulted estimators are established. Some simulation studies are conducted to evaluate the finite performance as well as the robustness of our proposed quantile regression method versus the well known profile least squares estimation procedure. Moreover, the Boston housing price data is given to further illustrate the application of the new method.  相似文献   

10.
Robust estimators of the scale parameters in the error-components model are described. The new estimators are based on the empirical characteristic functions of appropriate sets of residuals and are affine equivariant, consistent and asymptotically normal. The robustness of the new estimators is investigated via influence-function calculations. The results of Monte Carlo experiments and an example based on real data illustrate the usefulness of the estimators.  相似文献   

11.
In many areas of application, especially life testing and reliability, it is often of interest to estimate an unknown cumulative distribution (cdf). A simultaneous confidence band (SCB) of the cdf can be used to assess the statistical uncertainty of the estimated cdf over the entire range of the distribution. Cheng and Iles [1983. Confidence bands for cumulative distribution functions of continuous random variables. Technometrics 25 (1), 77–86] presented an approach to construct an SCB for the cdf of a continuous random variable. For the log-location-scale family of distributions, they gave explicit forms for the upper and lower boundaries of the SCB based on expected information. In this article, we extend the work of Cheng and Iles [1983. Confidence bands for cumulative distribution functions of continuous random variables. Technometrics 25 (1), 77–86] in several directions. We study the SCBs based on local information, expected information, and estimated expected information for both the “cdf method” and the “quantile method.” We also study the effects of exceptional cases where a simple SCB does not exist. We describe calibration of the bands to provide exact coverage for complete data and type II censoring and better approximate coverage for other kinds of censoring. We also discuss how to extend these procedures to regression analysis.  相似文献   

12.
Classical multivariate methods are often based on the sample covariance matrix, which is very sensitive to outlying observations. One alternative to the covariance matrix is the affine equivariant rank covariance matrix (RCM) that has been studied in Visuri et al. [2003. Affine equivariant multivariate rank methods. J. Statist. Plann. Inference 114, 161–185]. In this article we assume that the covariance matrix is partially known and study how to estimate the corresponding RCM. We use the properties that the RCM is affine equivariant and that the RCM is proportional to the inverse of the regular covariance matrix, and hence reduce the problem of estimating the original RCM to estimating marginal rank covariance matrices. This is a great computational advantage when the dimension of the original data vector is large.  相似文献   

13.
A reasonable approach to robust regression estimation is minimizing a robust scale estimator of the pairwise differences of residuals. We introduce a large class of estimators based on this strategy extending ideas of Yohai and Zamar (Am. Statist. (1993) 1824–1842) and Croux et al. (J. Am. Statist. Assoc. (1994) 1271–1281). The asymptotic robustness properties of the estimators in this class are addressed using the maxbias curve. We provide a general principle to compute this curve and present explicit formulae for several particular cases including generalized versions of S-, R- and τ-estimators. Finally, the most stable estimator in the class, that is, the estimator with the minimum maxbias curve, is shown to be the set of coefficients that minimizes an appropriate quantile of the distribution of the absolute pairwise differences of residuals.  相似文献   

14.
15.
In recent years, the Quintile Share Ratio (or QSR) has become a very popular measure of inequality. In 2001, the European Council decided that income inequality in European Union member states should be described using two indicators: the Gini Index and the QSR. The QSR is generally defined as the ratio of the total income earned by the richest 20% of the population relative to that earned by the poorest 20%. Thus, it can be expressed using quantile shares, where a quantile share is the share of total income earned by all of the units up to a given quantile. The aim of this paper is to propose an improved methodology for the estimation and variance estimation of the QSR in a complex sampling design framework. Because the QSR is a non-linear function of interest, the estimation of its sampling variance requires advanced methodology. Moreover, a non-trivial obstacle in the estimation of quantile shares in finite populations is the non-unique definition of a quantile. Thus, two different conceptions of the quantile share are presented in the paper, leading us to two different estimators of the QSR. Regarding variance estimation, [Osier, 2006] and [Osier, 2009] proposed a variance estimator based on linearization techniques. However, his method involves Gaussian kernel smoothing of cumulative distribution functions. Our approach, also based on linearization, shows that no smoothing is needed. The construction of confidence intervals is discussed and a proposition is made to account for the skewness of the sampling distribution of the QSR. Finally, simulation studies are run to assess the relevance of our theoretical results.  相似文献   

16.

Outlier detection is an inevitable step to most statistical data analyses. However, the mere detection of an outlying case does not always answer all scientific questions associated with that data point. Outlier detection techniques, classical and robust alike, will typically flag the entire case as outlying, or attribute a specific case weight to the entire case. In practice, particularly in high dimensional data, the outlier will most likely not be outlying along all of its variables, but just along a subset of them. If so, the scientific question why the case has been flagged as an outlier becomes of interest. In this article, a fast and efficient method is proposed to detect variables that contribute most to an outlier’s outlyingness. Thereby, it helps the analyst understand in which way an outlier lies out. The approach pursued in this work is to estimate the univariate direction of maximal outlyingness. It is shown that the problem of estimating that direction can be rewritten as the normed solution of a classical least squares regression problem. Identifying the subset of variables contributing most to outlyingness, can thus be achieved by estimating the associated least squares problem in a sparse manner. From a practical perspective, sparse partial least squares (SPLS) regression, preferably by the fast sparse NIPALS (SNIPLS) algorithm, is suggested to tackle that problem. The performed method is demonstrated to perform well both on simulated data and real life examples.

  相似文献   

17.
This paper is concerned with estimating the common hazard rate of two exponential distributions with unknown and ordered location parameters under a general class of bowl-shaped scale invariant loss functions. The inadmissibility of the best affine equivariant estimator is established by deriving an improved estimator. Another estimator is obtained which improves upon the best affine equivariant estimator. A class of improving estimators is derived using the integral expression of risk difference approach of Kubokawa [A unified approach to improving equivariant estimators. Ann Statist. 1994;22(1):290–299]. These results are applied to specific loss functions. It is further shown that these estimators can be derived for four important sampling schemes: (i) complete and i.i.d. sample, (ii) record values, (iii) type-II censoring, and (iv) progressive Type-II censoring. A simulation study is carried out for numerically comparing the risk performance of these proposed estimators.  相似文献   

18.
A general way of detecting multivariate outliers involves using robust depth functions, or, equivalently, the corresponding ‘outlyingness’ functions; the more outlying an observation, the more extreme (less deep) it is in the data cloud and thus potentially an outlier. Most outlier detection studies in the literature assume that the underlying distribution is multivariate normal. This paper deals with the case of multivariate skewed data, specifically when the data follow the multivariate skew-normal [1] distribution. We compare the outlier detection capabilities of four robust outlier detection methods through their outlyingness functions in a simulation study. Two scenarios are considered for the occurrence of outliers: ‘the cluster’ and ‘the radial’. Conclusions and recommendations are offered for each scenario.  相似文献   

19.
Abstract

In this paper, we consider the problem of estimating the quantile of a two-parameter exponential distribution with respect to an arbitrary strictly convex loss function under progressive type II censoring. Inadmissibility of the best affine equivariant (BAE) estimator is established through a conditional risk analysis. In particular we provide dominance results for quadratic, linex and absolute value loss functions. Further, a class of dominating estimators is derived using the IERD (integral expression of risk difference) approach of Kubokawa (1994 Kubokawa, T. 1994. A unified approach to improving equivariant estimators. The Annals of Statistics 22 (1):2909. doi:10.1214/aos/1176325369.[Crossref], [Web of Science ®] [Google Scholar]). In sequel the generalized Bayes estimator is shown to improve the BAE estimator.  相似文献   

20.
This paper studies penalized quantile regression for dynamic panel data with fixed effects, where the penalty involves l1 shrinkage of the fixed effects. Using extensive Monte Carlo simulations, we present evidence that the penalty term reduces the dynamic panel bias and increases the efficiency of the estimators. The underlying intuition is that there is no need to use instrumental variables for the lagged dependent variable in the dynamic panel data model without fixed effects. This provides an additional use for the shrinkage models, other than model selection and efficiency gains. We propose a Bayesian information criterion based estimator for the parameter that controls the degree of shrinkage. We illustrate the usefulness of the novel econometric technique by estimating a “target leverage” model that includes a speed of capital structure adjustment. Using the proposed penalized quantile regression model the estimates of the adjustment speeds lie between 3% and 44% across the quantiles, showing strong evidence that there is substantial heterogeneity in the speed of adjustment among firms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号