共查询到20条相似文献,搜索用时 15 毫秒
1.
The idea of searching for orthogonal projections, from a multidimensional space into a linear subspace, as an aid to detecting non-linear structure has been named exploratory projection pursuit.Most approaches are tied to the idea of searching for interesting projections. Typically, an interesting projection is one where the distribution of the projected data differs from the normal distribution. In this paper we define two projection indices which are aimed specifically at finding projections that best show grouped structure in the plane, if this exists in the multi-dimensional space. These involve a numerical optimization problem which is tackled in two stages, the projection and the pursuit; the first is based on a procedure to generate pseudo-random rotation matrices in the sense of the grand tour by D. Asimov (1985), and the second is a local numerical optimization procedure. One artificial and one real example illustrate the performance of the suggested indices. 相似文献
2.
Variable selection is a very important tool when dealing with high dimensional data. However, most popular variable selection methods are model based, which might provide misleading results when the model assumption is not satisfied. Sufficient dimension reduction provides a general framework for model-free variable selection methods. In this paper, we propose a model-free variable selection method via sufficient dimension reduction, which incorporates the grouping information into the selection procedure for multi-population data. Theoretical properties of our selection methods are also discussed. Simulation studies suggest that our method greatly outperforms those ignoring the grouping information. 相似文献
3.
Trevor Park 《Journal of applied statistics》2007,34(7):767-777
The penalized likelihood principal component method of Park (2005) offers flexibility in the choice of the penalty function. This flexibility allows the method to be tailored to enhance interpretation in special cases. Of particular interest is a penalty function in the style of the Lasso that can be used to produce exactly zero loadings. Also of interest is a penalty function for cases in which interpretability is best represented by alignment with orthogonal subspaces, rather than with axis directions. In each case, a data example is presented. 相似文献
4.
Giovanna Menardi Nicola Torelli 《Journal of Statistical Computation and Simulation》2013,83(11):2047-2063
Clustering high-dimensional data is often a challenging task both because of the computational burden required to run any technique, and because the difficulty in interpreting clusters generally increases with the data dimension. In this work, a method for finding low-dimensional representations of high-dimensional data is discussed, specifically conceived to preserve possible clusters in data. It is based on the critical bandwidth, a nonparametric statistic to test unimodality, related to kernel density estimation. Some useful properties of the aforementioned statistic are enlightened and an adjustment to use it as a basis for reducing dimensionality is suggested. The method is illustrated by simulated and real data examples. 相似文献
5.
基于遗传算法的投影寻踪聚类 总被引:2,自引:0,他引:2
传统的投影寻踪聚类算法PROCLUS是一种有效的处理高维数据聚类的算法,但此算法是利用爬山法(Hill climbing)对各类中心点进行循环迭代、选取最优的过程,由于爬山法是一种局部搜索(local search)方法,得到的最优解可能仅仅是局部最优。针对上述缺陷,提出一种改进的投影寻踪聚类算法,即利用遗传算法(Genetic Algorithm)对各类中心点进行循环迭代,寻找到全局最优解。仿真实验结果证明了新算法的可行性和有效性。 相似文献
6.
Han Lin Shang 《Journal of Statistical Computation and Simulation》2019,89(5):795-814
Univariate time series often take the form of a collection of curves observed sequentially over time. Examples of these include hourly ground-level ozone concentration curves. These curves can be viewed as a time series of functions observed at equally spaced intervals over a dense grid. Since functional time series may contain various types of outliers, we introduce a robust functional time series forecasting method to down-weigh the influence of outliers in forecasting. Through a robust principal component analysis based on projection pursuit, a time series of functions can be decomposed into a set of robust dynamic functional principal components and their associated scores. Conditioning on the estimated functional principal components, the crux of the curve-forecasting problem lies in modelling and forecasting principal component scores, through a robust vector autoregressive forecasting method. Via a simulation study and an empirical study on forecasting ground-level ozone concentration, the robust method demonstrates the superior forecast accuracy that dynamic functional principal component regression entails. The robust method also shows the superior estimation accuracy of the parameters in the vector autoregressive models for modelling and forecasting principal component scores, and thus improves curve forecast accuracy. 相似文献
7.
Principal fitted component (PFC) models are a class of likelihood-based inverse regression methods that yield a so-called sufficient reduction of the random p-vector of predictors X given the response Y. Assuming that a large number of the predictors has no information about Y, we aimed to obtain an estimate of the sufficient reduction that ‘purges’ these irrelevant predictors, and thus, select the most useful ones. We devised a procedure using observed significance values from the univariate fittings to yield a sparse PFC, a purged estimate of the sufficient reduction. The performance of the method is compared to that of penalized forward linear regression models for variable selection in high-dimensional settings. 相似文献
8.
Exploratory data structure comparisons: three new visual tools based on principal component analysis
Anne Helby Petersen Bo Markussen Karl Bang Christensen 《Journal of applied statistics》2021,48(9):1675
Datasets are sometimes divided into distinct subsets, e.g. due to multi-center sampling, or to variations in instruments, questionnaire item ordering or mode of administration, and the data analyst then needs to assess whether a joint analysis is meaningful. The Principal Component Analysis-based Data Structure Comparisons (PCADSC) tools are three new non-parametric, visual diagnostic tools for investigating differences in structure for two subsets of a dataset through covariance matrix comparisons by use of principal component analysis. The PCADCS tools are demonstrated in a data example using European Social Survey data on psychological well-being in three countries, Denmark, Sweden, and Bulgaria. The data structures are found to be different in Denmark and Bulgaria, and thus a comparison of for example mean psychological well-being scores is not meaningful. However, when comparing Denmark and Sweden, very similar data structures, and thus comparable concepts of well-being, are found. Therefore, inter-country comparisons are warranted for these countries. 相似文献
9.
《Journal of Statistical Computation and Simulation》2012,82(12):2411-2428
ABSTRACTProcess capability indices measure the ability of a process to provide products that meet certain specifications. Few references deal with the capability of a process characterized by a functional relationship between a response variable and one or more explanatory variables, which is called profile. Specifically, there is not any reference analysing the capability of processes characterized by multivariate nonlinear profiles. In this paper, we propose a method to measure the capability of these processes, based on principal components for multivariate functional data and the concept of functional depth. A simulation study is conducted to assess the performance of the proposed method. An example from the sugar production illustrates the applicability of this approach. 相似文献
10.
11.
为了研究陕西省2000—2008年经济发展与环境、资源、人民生活水平三个方面的协调程度,在选择了相应的代表性指标后,利用投影寻踪方法对指标进行降维分析,根据投影函数值对2000—2008年陕西省经济发展的总体协调性进行评价,并采用灰色关联度等方法针对经济—环境、经济—资源、经济—人民生活水平三个方面进行对比分析,阐述了陕西省当前在经济协调发展方面的现状及存在的主要问题。 相似文献
12.
Combination of multiple biomarkers to improve diagnostic accuracy is meaningful for practitioners and clinicians, and are attractive to lots of researchers. Nowadays, with development of modern techniques, functional markers such as curves or images, play an important role in diagnosis. There exists rich literature developing combination methods for continuous scalar markers. Unfortunately, only sporadic works have studied how functional markers affect diagnosis in the literature. Moreover, no publication can be found to do combination of multiple functional markers to improve the diagnostic accuracy. It is impossible to apply scalar combination methods to the multiple functional markers directly because of infinite dimensionality of functional markers. In this article, we propose a one-dimension scalar feature motivated by square loss distance, as an alternative of the original functional curve in the sense that, it can retain information to the most extent. The square loss distance is defined as the function of projection scores generated from functional principal component decomposition. Then existing variety of scalar combination methods can be applied to scalar features of functional markers after dimension reduction to improve the diagnostic accuracy. Area under the receiver operating characteristic curve and Youden index are used to assess performances of various methods in numerical studies. We also analyzed the high- or low- hospital admissions due to respiratory diseases between 2010 and 2017 in Hong Kong by combining weather conditions and media information, which are regarded as functional markers. Finally, we provide an R function for convenient application. 相似文献
13.
Projection techniques for nonlinear principal component analysis 总被引:4,自引:0,他引:4
Principal Components Analysis (PCA) is traditionally a linear technique for projecting multidimensional data onto lower dimensional subspaces with minimal loss of variance. However, there are several applications where the data lie in a lower dimensional subspace that is not linear; in these cases linear PCA is not the optimal method to recover this subspace and thus account for the largest proportion of variance in the data.Nonlinear PCA addresses the nonlinearity problem by relaxing the linear restrictions on standard PCA. We investigate both linear and nonlinear approaches to PCA both exclusively and in combination. In particular we introduce a combination of projection pursuit and nonlinear regression for nonlinear PCA. We compare the success of PCA techniques in variance recovery by applying linear, nonlinear and hybrid methods to some simulated and real data sets.We show that the best linear projection that captures the structure in the data (in the sense that the original data can be reconstructed from the projection) is not necessarily a (linear) principal component. We also show that the ability of certain nonlinear projections to capture data structure is affected by the choice of constraint in the eigendecomposition of a nonlinear transform of the data. Similar success in recovering data structure was observed for both linear and nonlinear projections. 相似文献
14.
Giovanni Maria Merola 《Journal of applied statistics》2020,47(8):1325
We propose an algorithmic framework for computing sparse components from rotated principal components. This methodology, called SIMPCA, is useful to replace the unreliable practice of ignoring small coefficients of rotated components when interpreting them. The algorithm computes genuinely sparse components by projecting rotated principal components onto subsets of variables. The so simplified components are highly correlated with the corresponding components. By choosing different simplification strategies different sparse solutions can be obtained which can be used to compare alternative interpretations of the principal components. We give some examples of how effective simplified solutions can be achieved with SIMPCA using some publicly available data sets. 相似文献
15.
Jae Keun Yoo 《Statistics》2018,52(2):409-425
In this paper, a model-based approach to reduce the dimension of response variables in multivariate regression is newly proposed, following the existing context of the response dimension reduction developed by Yoo and Cook [Response dimension reduction for the conditional mean in multivariate regression. Comput Statist Data Anal. 2008;53:334–343]. The related dimension reduction subspace is estimated by maximum likelihood, assuming an additive error. In the new approach, the linearity condition, which is assumed for the methodological development in Yoo and Cook (2008), is understood through the covariance matrix of the random error. Numerical studies show potential advantages of the proposed approach over Yoo and Cook (2008). A real data example is presented for illustration. 相似文献
16.
In this note we extend univariate tests for normality and symmetry based on empirical characteristic functions to the multivariate case. 相似文献
17.
《Journal of Statistical Computation and Simulation》2012,82(11):2298-2315
Principal component analysis (PCA) is a widely used statistical technique for determining subscales in questionnaire data. As in any other statistical technique, missing data may both complicate its execution and interpretation. In this study, six methods for dealing with missing data in the context of PCA are reviewed and compared: listwise deletion (LD), pairwise deletion, the missing data passive approach, regularized PCA, the expectation-maximization algorithm, and multiple imputation. Simulations show that except for LD, all methods give about equally good results for realistic percentages of missing data. Therefore, the choice of a procedure can be based on the ease of application or purely the convenience of availability of a technique. 相似文献
18.
Statistics, as one of the applied sciences, has great impacts in vast area of other sciences. Prediction of protein structures with great emphasize on their geometrical features using dihedral angles has invoked the new branch of statistics, known as directional statistics. One of the available biological techniques to predict is molecular dynamics simulations producing high-dimensional molecular structure data. Hence, it is expected that the principal component analysis (PCA) can response some related statistical problems particulary to reduce dimensions of the involved variables. Since the dihedral angles are variables on non-Euclidean space (their locus is the torus), it is expected that direct implementation of PCA does not provide great information in this case. The principal geodesic analysis is one of the recent methods to reduce the dimensions in the non-Euclidean case. A procedure to utilize this technique for reducing the dimension of a set of dihedral angles is highlighted in this paper. We further propose an extension of this tool, implemented in such way the torus is approximated by the product of two unit circle and evaluate its application in studying a real data set. A comparison of this technique with some previous methods is also undertaken. 相似文献
19.
对2000—2006年中国东部、中部、西部地区保险密度的差异进行了比较,对保险密度的影响因子进行了主成分分析,利用PandData模型分别对东部、中部、西部地区进行回归分析,研究表明:引起各地区保险密度差异的因素主要有地区人均GDP、人均消费水平、文化程度、城市化、产业结构、社会福利费用、性别比和年龄结构等,不同地区保险密度的影响因素和影响程度不同。为了缩小保险密度区域性差异,应针对不同地区采取相应的政策措施。 相似文献
20.
Principal axis factoring (PAF) and maximum likelihood factor analysis (MLFA) are two of the most popular estimation methods in exploratory factor analysis. It is known that PAF is better able to recover weak factors and that the maximum likelihood estimator is asymptotically efficient. However, there is almost no evidence regarding which method should be preferred for different types of factor patterns and sample sizes. Simulations were conducted to investigate factor recovery by PAF and MLFA for distortions of ideal simple structure and sample sizes between 25 and 5000. Results showed that PAF is preferred for population solutions with few indicators per factor and for overextraction. MLFA outperformed PAF in cases of unequal loadings within factors and for underextraction. It was further shown that PAF and MLFA do not always converge with increasing sample size. The simulation findings were confirmed by an empirical study as well as by a classic plasmode, Thurstone's box problem. The present results are of practical value for factor analysts. 相似文献