首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Summary: Data depth is a concept that measures the centrality of a point in a given data cloud x 1, x 2,...,x n or in a multivariate distribution P X on d d . Every depth defines a family of so–called trimmed regions. The –trimmed region is given by the set of points that have a depth of at least . Data depth has been used to define multivariate measures of location and dispersion as well as multivariate dispersion orders.If the depth of a point can be represented as the minimum of the depths with respect to all unidimensional projections, we say that the depth satisfies the (weak) projection property. Many depths which have been proposed in the literature can be shown to satisfy the weak projection property. A depth is said to satisfy the strong projection property if for every the unidimensional projection of the –trimmed region equals the –trimmed region of the projected distribution.After a short introduction into the general concept of data depth we formally define the weak and the strong projection property and give necessary and sufficient criteria for the projection property to hold. We further show that the projection property facilitates the construction of depths from univariate trimmed regions. We discuss some of the depths proposed in the literature which possess the projection property and define a general class of projection depths, which are constructed from univariate trimmed regions by using the above method.Finally, algorithmic aspects of projection depths are discussed. We describe an algorithm which enables the approximate computation of depths that satisfy the projection property.  相似文献   

2.
Superefficiency of a projection density estimator The author constructs a projection density estimator with a data‐driven truncation index. This estimator reaches the superoptimal rates 1/n in mean integrated square error and {In ln(n/n}1/2 in uniform almost sure convergence over a given subspace which is dense in the class of all possible densities; the rate of the estimator is quasi‐optimal everywhere else. The subspace in question may be chosen a priori by the statistician.  相似文献   

3.
In extending univariate outlier detection methods to higher dimension, various issues arise: limited visualization methods, inadequacy of marginal methods, lack of a natural order, limited parametric modeling, and, when using Mahalanobis distance, restriction to ellipsoidal contours. To address and overcome such limitations, we introduce nonparametric multivariate outlier identifiers based on multivariate depth functions, which can generate contours following the shape of the data set. Also, we study masking robustness, that is, robustness against misidentification of outliers as nonoutliers. In particular, we define a masking breakdown point (MBP), adapting to our setting certain ideas of Davies and Gather [1993. The identification of multiple outliers (with discussion). Journal of the American Statistical Association 88, 782–801] and Becker and Gather [1999. The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association 94, 947–955] based on the Mahalanobis distance outlyingness. We then compare four affine invariant outlier detection procedures, based on Mahalanobis distance, halfspace or Tukey depth, projection depth, and “Mahalanobis spatial” depth. For the goal of threshold type outlier detection, it is found that the Mahalanobis distance and projection procedures are distinctly superior in performance, each with very high MBP, while the halfspace approach is quite inferior. When a moderate MBP suffices, the Mahalanobis spatial procedure is competitive in view of its contours not constrained to be elliptical and its computational burden relatively mild. A small sampling experiment yields findings completely in accordance with the theoretical comparisons. While these four depth procedures are relatively comparable for the purpose of robust affine equivariant location estimation, the halfspace depth is not competitive with the others for the quite different goal of robust setting of an outlyingness threshold.  相似文献   

4.
The RV-coefficient (Escoufier, 1973; Robert and Escoufier, 1976) is studied as a sensitivity coefficient of the subspace spanned by dominant eigenvectors in principal component analysis. We use the perturbation expansion up to second order term of the corresponding projection matrix. The relationship with the measures by Benasseni (1990) and Krzanowski (1979) is also discussed.  相似文献   

5.
ABSTRACT

Orthogonal arrays are used as screening designs to identify active main effects, after which the properties of the subdesign for estimating these effects and possibly their interactions become important. Such a subdesign is known as a “projection design”. In this article, we have identified all the geometric non isomorphic projection designs of an OA(27,13,3,2), an OA(18,7,3,2) and an OA(36,13,3,2) into k = 3,4, and 5 factors when they are used for screening out active quantitative experimental factors, with regard to the prior selection of the middle level of factors. We use the popular D-efficiency criterion to evaluate the ability of each design found in estimating the parameters of a second order model.  相似文献   

6.
In high-dimensional data, one often seeks a few interesting low-dimensional projections which reveal important aspects of the data. Projection pursuit for classification finds projections that reveal differences between classes. Even though projection pursuit is used to bypass the curse of dimensionality, most indexes will not work well when there are a small number of observations relative to the number of variables, known as a large p (dimension) small n (sample size) problem. This paper discusses the relationship between the sample size and dimensionality on classification and proposes a new projection pursuit index that overcomes the problem of small sample size for exploratory classification.  相似文献   

7.
This paper is a continuation of one (1992) in which the author studied the paradoxes that can arise when a nonparametric statistical test is used to give an ordering of k samples and the subsets of those samples. This article characterizes the projection paradoxes that can occur when using contingency tables, complete block designs, and tests of dichotomous behaviour of several samples. This is done by examining the “dictionaries” of possible orderings of each of these procedures. Specifically, it is shown that contingency tables and complete block designs, like the Kruskal-Wallis nonparametric test on k samples, minimize the number and kinds of projection paradoxes that can occur; however, using a test of dichotomous behaviour of several samples does not. An analysis is given of two procedures used to determine the ordering of a pair of samples from a set of k samples. It is shown that these two procedures may not have anything in common.  相似文献   

8.
In the present article, we discuss the regression of a point on the surface of a unit sphere in d dimensions given a point on the surface of a unit sphere in p dimensions, where p may not be equal to d. Point projection is added to the rotation and linear transformation for regression link function. The identifiability of the model is proved. Then, parameter estimation in this set up is discussed. Simulation studies and data analyses are done to illustrate the model.  相似文献   

9.
In this paper, we consider the problem of adaptive density or survival function estimation in an additive model defined by Z=X+Y with X independent of Y, when both random variables are non‐negative. This model is relevant, for instance, in reliability fields where we are interested in the failure time of a certain material that cannot be isolated from the system it belongs. Our goal is to recover the distribution of X (density or survival function) through n observations of Z, assuming that the distribution of Y is known. This issue can be seen as the classical statistical problem of deconvolution that has been tackled in many cases using Fourier‐type approaches. Nonetheless, in the present case, the random variables have the particularity to be supported. Knowing that, we propose a new angle of attack by building a projection estimator with an appropriate Laguerre basis. We present upper bounds on the mean squared integrated risk of our density and survival function estimators. We then describe a non‐parametric data‐driven strategy for selecting a relevant projection space. The procedures are illustrated with simulated data and compared with the performances of a more classical deconvolution setting using a Fourier approach. Our procedure achieves faster convergence rates than Fourier methods for estimating these functions.  相似文献   

10.
This paper focuses on unsupervised curve classification in the context of nuclear industry. At the Commissariat à l'Energie Atomique (CEA), Cadarache (France), the thermal-hydraulic computer code CATHARE is used to study the reliability of reactor vessels. The code inputs are physical parameters and the outputs are time evolution curves of a few other physical quantities. As the CATHARE code is quite complex and CPU time-consuming, it has to be approximated by a regression model. This regression process involves a clustering step. In the present paper, the CATHARE output curves are clustered using a k-means scheme, with a projection onto a lower dimensional space. We study the properties of the empirically optimal cluster centres found by the clustering method based on projections, compared with the ‘true’ ones. The choice of the projection basis is discussed, and an algorithm is implemented to select the best projection basis among a library of orthonormal bases. The approach is illustrated on a simulated example and then applied to the industrial problem.  相似文献   

11.
In this paper, a nonparametric discriminant analysis procedure that is less sensitive than traditional procedures to deviations from the usual assumptions is proposed. The procedure uses the projection pursuit methodology where the projection index is the two-group transvariation probability. Montanari [A. Montanari, Linear discriminant analysis and transvariation, J. Classification 21 (2004), pp. 71–88] proposed and used this projection index to measure group separation but allocated the new observation using distances. Our procedure employs a method of allocation based on group–group transvariation probability to classify the new observation. A simulation study shows that the procedure proposed in this paper provides lower misclassification error rates than classical procedures like linear discriminant analysis and quadratic discriminant analysis and recent procedures like maximum depth and Montanari's transvariation-based classifiers, when the underlying distributions are skewed and/or the prior probabilities are unequal.  相似文献   

12.
An approach to non-linear principal components using radially symmetric kernel basis functions is described. The procedure consists of two steps: a projection of the data set to a reduced dimension using a non-linear transformation whose parameters are determined by the solution of a generalized symmetric eigenvector equation. This is achieved by demanding a maximum variance transformation subject to a normalization condition (Hotelling's approach) and can be related to the homogeneity analysis approach of Gifi through the minimization of a loss function. The transformed variables are the principal components whose values define contours, or more generally hypersurfaces, in the data space. The second stage of the procedure defines the fitting surface, the principal surface, in the data space (again as a weighted sum of kernel basis functions) using the definition of self-consistency of Hastie and Stuetzle. The parameters of this principal surface are determined by a singular value decomposition and crossvalidation is used to obtain the kernel bandwidths. The approach is assessed on four data sets.  相似文献   

13.
In this article we provide saddlepoint approximations for some important models of circular data. The particularity of these saddlepoint approximations is that they do not require solving the saddlepoint equation iteratively, so their evaluation is immediate. We first give very accurate approximations to P-values, critical values and power functions for some optimal tests regarding the concentration parameter under wrapped symmetric α-stable and circular normal models. Then, we consider an approximation to the distribution of a projection of the two-dimensional Pearson random walk with exponential step sizes.  相似文献   

14.
We derive the best linear unbiased interpolation for the missing order statistics of a random sample using the well-known projection theorem. The proposed interpolation method only needs the first two moments on both sides of a missing order statistic. A simulation study is performed to compare the proposed method with a few interpolation methods for exponential and Lévy distributions.  相似文献   

15.
Abstract

It is shown in this paper that a quasi order for the vectors in Rp is a cone induced if and only if the order is preservable under limits and under linear combinations with non-negative coefficients. For the mean vectors in MANOVA subject to the restriction of simple ordering, a pseudo restricted MLE is proposed. This estimator is a matrix projection onto a closed convex set inside the restricted domain. An algorithm for the pseudo restricted MLE is developed, that computes the matrix projections using only vector projections.  相似文献   

16.
In multiple linear regression analysis each lower-dimensional subspace L of a known linear subspace M of ? n corresponds to a non empty subset of the columns of the regressor matrix. For a fixed subspace L, the C p statistic is an unbiased estimator of the mean square error if the projection of the response vector onto L is used to estimate the expected response. In this article, we consider two truncated versions of the C p statistic that can also be used to estimate this mean square error. The C p statistic and its truncated versions are compared in two example data sets, illustrating that use of the truncated versions may result in models different from those selected by standard C p .  相似文献   

17.
We present sharp mean–variance bounds for expectations of kth record values based on distributions coming from restricted families of distributions. These families are defined in terms of convex or star ordering with respect to generalized Pareto distribution. The bounds for expectations of kth record values from DD, DFR, DDA, and DFRA families are special cases of our results. The bounds are derived by application of the projection method.  相似文献   

18.
We consider the problem of constructing search designs for 3m factorial designs. By using projection properties of some three-level orthogonal arrays, some search designs are obtained for 3 ? m ? 11. The new obtained orthogonal search designs are capable of searching and identifying up to four two-factor interactions and estimating them along with the general mean and main effects. The resulted designs have very high searching probabilities; it means that besides the well-known orthogonal structure, they have high ability in searching the true effects.  相似文献   

19.
A robust estimator introduced by Beran (1977a, 1977b), which is based on the minimum Hellinger distance between a projection model density and a nonparametric sample density, is studied empirically. An extensive simulation provides an estimate of the small sample distribution and supplies empirical evidence of the estimator performance for a normal location-scale model. While the performance of the minimum Hellinger distance estimator is seen to be competitive with the maximum likelihood estimator at the true model, its robustness to deviations from normality is shown to be competitive in this setting with that obtained from the M-estimator and the Cramér-von Mises minimum distance estimator. Beran also introduced a goodness-of-fit statisticH 2, based on the minimized Hellinger distance between a member of a parametric family of densities and a nonparametric density estimate. We investigate the statistic H (the square root of H 2) as a test for normality when both location and scale are unspecified. Empirically derived critical values are given which do not require extensive tables. The power of the statistic H compares favorably with four other widely used tests for normality.  相似文献   

20.
Two-level designs are useful to examine a large number of factors in an efficient manner. It is typically anticipated that only a few factors will be identified as important ones. The results can then be reanalyzed using a projection of the original design, projected into the space of the factors that matter. An interesting question is how many intrinsically different type of projections are possible from an initial given design. We examine this question here for the Plackett and Burman screening series with N= 12, 20 and 24 runs and projected dimensions k≤5. As a characterization criterion, we look at the number of repeat and mirror-image runs in the projections. The idea can be applied toany two-level design projected into fewer dimensions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号