首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Loosely speaking a robust projection index is one that prefers projections involving true clusters over projections consisting of a cluster and an outlier. We introduce a mathematical definition of one-dimensional index robustness and describe a numerical experiment to measure it. We design five new indices based on measuring divergence from Student's t -distribution which are intended to be especially robust: the experiment shows that they are more robust than several established indices. The experiment also reveals more generally that the robustness of moment indices depends on the number of approximation terms, providing additional practical guidance for existing projection pursuit implementations. We investigate the theoretical properties of one new Student t -index and Hall's index and show that the new index automatically adapts its robustness to the degree of outlier contamination. We conclude by outlining the possibilities for extending our experiments to both higher dimensions and other new indices.  相似文献   

2.
In high-dimensional data, one often seeks a few interesting low-dimensional projections which reveal important aspects of the data. Projection pursuit for classification finds projections that reveal differences between classes. Even though projection pursuit is used to bypass the curse of dimensionality, most indexes will not work well when there are a small number of observations relative to the number of variables, known as a large p (dimension) small n (sample size) problem. This paper discusses the relationship between the sample size and dimensionality on classification and proposes a new projection pursuit index that overcomes the problem of small sample size for exploratory classification.  相似文献   

3.
A novel projection pursuit method based on projecting the data onto itself is proposed. Using a number of real datasets it is shown how to obtain interesting one and two-dimensional projections using only O(n) evaluations of a one-dimensional projection index.  相似文献   

4.
The most common techniques for graphically presenting a multivariate dataset involve projection onto a one or two-dimensional subspace. Interpretation of such plots is not always straightforward because projections are smoothing operations in that structure can be obscured by projection but never enhanced. In this paper an alternative procedure for finding interesting features is proposed that is based on locating the modes of an induced hyperspherical density function, and a simple algorithm for this purpose is developed. Emphasis is placed on identifying the non-linear effects, such as clustering, so to this end the data are firstly sphered to remove all of the location, scale and correlational structure. A set of simulated bivariate data and artistic qualities of painters data are used as examples.  相似文献   

5.
Two-level designs are useful to examine a large number of factors in an efficient manner. It is typically anticipated that only a few factors will be identified as important ones. The results can then be reanalyzed using a projection of the original design, projected into the space of the factors that matter. An interesting question is how many intrinsically different type of projections are possible from an initial given design. We examine this question here for the Plackett and Burman screening series with N= 12, 20 and 24 runs and projected dimensions k≤5. As a characterization criterion, we look at the number of repeat and mirror-image runs in the projections. The idea can be applied toany two-level design projected into fewer dimensions.  相似文献   

6.
Testing the joint independence of variables has long been an interesting issue in statistical inferences. Blum, Kiefer and Rosenblatt (1961) suggested a test based on a sample distribution function. To overcome the sparseness of data points in high-dimensional space and deal with general cases, we in this paper suggest several extended versions of B-K-R tests via projection pursuit. Bootstrap method is applied to determine the critical values and for computational reason, an approximation derived by Number-theoretic method, for the bootstrap statistics is suggested. Several simulation experiments are performed and a real-life example is investigated.  相似文献   

7.
To seek the nonlinear structure hidden in data points of high-dimension, a transformation related to projection pursuit method and a projection index were proposed by Li (1989, 1990 ). In this paper, we present a consistent estimator of the supremum of the projection index based sliced inverse regression technique. This estimator also suggests a method to obtain approximately the most interesting projection in the general case.  相似文献   

8.
Projection techniques for nonlinear principal component analysis   总被引:4,自引:0,他引:4  
Principal Components Analysis (PCA) is traditionally a linear technique for projecting multidimensional data onto lower dimensional subspaces with minimal loss of variance. However, there are several applications where the data lie in a lower dimensional subspace that is not linear; in these cases linear PCA is not the optimal method to recover this subspace and thus account for the largest proportion of variance in the data.Nonlinear PCA addresses the nonlinearity problem by relaxing the linear restrictions on standard PCA. We investigate both linear and nonlinear approaches to PCA both exclusively and in combination. In particular we introduce a combination of projection pursuit and nonlinear regression for nonlinear PCA. We compare the success of PCA techniques in variance recovery by applying linear, nonlinear and hybrid methods to some simulated and real data sets.We show that the best linear projection that captures the structure in the data (in the sense that the original data can be reconstructed from the projection) is not necessarily a (linear) principal component. We also show that the ability of certain nonlinear projections to capture data structure is affected by the choice of constraint in the eigendecomposition of a nonlinear transform of the data. Similar success in recovering data structure was observed for both linear and nonlinear projections.  相似文献   

9.
Tests of forecast accuracy and bias for county population projections   总被引:1,自引:0,他引:1  
"This article deals with the forecast accuracy and bias of population projections for 2,971 counties in the United States. It uses three different projection techniques and data from 1950, 1960, 1970, and 1980 to make two sets of 10-year projections and one set of 20-year projections. These projections are compared with census counts to determine forecast errors. The size, direction, and distribution of forecast errors are analyzed by size of place, rate of growth, and length of projection horizon. A number of consistent patterns are noted, and an extension of the empirical results to the production of confidence intervals for population projections is considered." A comment by Paul M. Beaumont and Andrew M. Isserman is included (pp. 1,004-9) together with a rejoinder by the author (pp. 1,009-12). This is a revised version of a paper presented at the 1986 Annual Meeting of the Population Association of America (see Population Index, Vol. 52, No. 3, Fall 1986, p. 456).  相似文献   

10.
《Statistics》2012,46(6):1357-1385
ABSTRACT

The early stages of many real-life experiments involve a large number of factors among which only a few factors are active. Unfortunately, the optimal full-dimensional designs of those early stages may have bad low-dimensional projections and the experimenters do not know which factors turn out to be important before conducting the experiment. Therefore, designs with good projections are desirable for factor screening. In this regard, significant questions are arising such as whether the optimal full-dimensional designs have good projections onto low dimensions? How experimenters can measure the goodness of a full-dimensional design by focusing on all of its projections?, and are there linkages between the optimality of a full-dimensional design and the optimality of its projections? Through theoretical justifications, this paper tries to provide answers to these interesting questions by investigating the construction of optimal (average) projection designs for screening either nominal or quantitative factors. The main results show that: based on the aberration and orthogonality criteria the full-dimensional design is optimal if and only if it is optimal projection design; the full-dimensional design is optimal via the aberration and orthogonality if and only if it is uniform projection design; there is no guarantee that a uniform full-dimensional design is optimal projection design via any criterion; the projection design is optimal via the aberration, orthogonality and uniformity criteria if it is optimal via any criterion of them; and the saturated orthogonal designs have the same average projection performance.  相似文献   

11.
Lu Lin   《Statistical Methodology》2006,3(4):444-455
If the form of the distribution of data is unknown, the Bayesian method fails in the parametric inference because there is no posterior distribution of the parameter. In this paper, a theoretical framework of Bayesian likelihood is introduced via the Hilbert space method, which is free of the distributions of data and the parameter. The posterior distribution and posterior score function based on given inner products are defined and, consequently, the quasi posterior distribution and quasi posterior score function are derived, respectively, as the projections of the posterior distribution and posterior score function onto the space spanned by given estimating functions. In the space spanned by data, particularly, an explicit representation for the quasi posterior score function is obtained, which can be derived as a projection of the true posterior score function onto this space. The methods of constructing conservative quasi posterior score and quasi posterior log-likelihood are proposed. Some examples are given to illustrate the theoretical results. As an application, the quasi posterior distribution functions are used to select variables for generalized linear models. It is proved that, for linear models, the variable selections via quasi posterior distribution functions are equivalent to the variable selections via the penalized residual sum of squares or regression sum of squares.  相似文献   

12.
Abstract. We investigate resampling methodologies for testing the null hypothesis that two samples of labelled landmark data in three dimensions come from populations with a common mean reflection shape or mean reflection size‐and‐shape. The investigation includes comparisons between (i) two different test statistics that are functions of the projection onto tangent space of the data, namely the James statistic and an empirical likelihood statistic; (ii) bootstrap and permutation procedures; and (iii) three methods for resampling under the null hypothesis, namely translating in tangent space, resampling using weights determined by empirical likelihood and using a novel method to transform the original sample entirely within refection shape space. We present results of extensive numerical simulations, on which basis we recommend a bootstrap test procedure that we expect will work well in practise. We demonstrate the procedure using a data set of human faces, to test whether humans in different age groups have a common mean face shape.  相似文献   

13.
Exact confidence regions for all the parameters in nonlinear regression models can be obtained by comparing the lengths of projections of the error vector into orthogonal subspaces of the sample space. In certain partially nonlinear models an alternative exact region is obtained by replacing the linear parameters by their conditional estimates in the projection matrices. An ellipsoidal approximation to the alternative region is obtained in terms of the tangent-plane coordinates, similar to one previously obtained for the more usual region. This ellipsoid can be converted to an approximate region for the original parameters and can be used to compare the two types of exact confidence regions.  相似文献   

14.
Existing sample statistics do little to address the question of multimodality, a question which is interesting in itself and which also arises in exploratory multivariate data analysis using projection pursuit. We propose a new index more strongly geared to the specific task of measuring multimodality than other sample statistics known to us, we show how to compute it, explore its properties, and consider its generalisation to the multivariate case. The behaviour of the index is illustrated by some simple numerical examples.  相似文献   

15.
In this paper we define a new class of designs for computer experiments. A projection array based design defines sets of simulation runs with properties that extend the conceptual properties of orthogonal array based Latin hypercube sampling, particularly to underlying design structures other than orthogonal arrays. Additionally, we illustrate how these designs can be sequentially augmented to improve the overall projection properties of the initial design or focus on interesting regions of the design space that need further exploration to improve the overall fit of the underlying response surface. We also illustrate how an initial Latin hypercube sample can be expressed as a projection array based design and show how one can augment these designs to improve higher dimensional space filling properties.  相似文献   

16.
A nonparametric discriminant analysis procedure that is robust to deviations from the usual assumptions is proposed. The procedure uses the projection pursuit methodology where the projection index is the two-group transvariation probability. We use allocation based on the centrality of the new point measured using a smooth version of point-group transvariation. It is shown that the new procedure provides lower misclassification error rates than competing methods for data from skewed heavy-tailed and skewed distributions as well as unequal training data sizes.  相似文献   

17.
Motivated from problems in canonical correlation analysis, reduced rank regression and sufficient dimension reduction, we introduce a double dimension reduction model where a single index of the multivariate response is linked to the multivariate covariate through a single index of these covariates, hence the name double single index model. Because nonlinear association between two sets of multivariate variables can be arbitrarily complex and even intractable in general, we aim at seeking a principal one‐dimensional association structure where a response index is fully characterized by a single predictor index. The functional relation between the two single‐indices is left unspecified, allowing flexible exploration of any potential nonlinear association. We argue that such double single index association is meaningful and easy to interpret, and the rest of the multi‐dimensional dependence structure can be treated as nuisance in model estimation. We investigate the estimation and inference of both indices and the regression function, and derive the asymptotic properties of our procedure. We illustrate the numerical performance in finite samples and demonstrate the usefulness of the modelling and estimation procedure in a multi‐covariate multi‐response problem concerning concrete.  相似文献   

18.
For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.  相似文献   

19.
In quality control, we may confront imprecise concepts. One case is a situation in which upper and lower specification limits (SLs) are imprecise. If we introduce vagueness into SLs, we face quite new, reasonable and interesting processes, and the ordinary capability indices are not appropriate for measuring the capability of these processes. In this paper, similar to the traditional process capability indices (PCIs), we develop a fuzzy analogue by a distance defined on a fuzzy limit space and introduce PCIs, where instead of precise SLs we have two membership functions for upper and lower SLs. These indices are necessary when SLs are fuzzy, and they are helpful for comparing manufacturing process with fuzzy SLs. Some interesting relations among these introduced indices are proved. Numerical examples are given to clarify the method.  相似文献   

20.
In this note we propose two procedures for testing homogeneity of co-variance matrices that are both extensions of Hartley's (1940) test for equality of variances. The first is a two-stage procedure where the first step is a simple test for equality of the largest eigenvalues, and corresponding eigenvectors, of the covariance matrices. The second is based on projection pursuit and seems harder to apply in practice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号