首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The existence of a dimension reduction (DR) subspace is a common assumption in regression analysis when dealing with high-dimensional predictors. The estimation of such a DR subspace has received considerable attention in the past few years, the most popular method being undoubtedly the sliced inverse regression. In this paper, we propose a new estimation procedure of the DR subspace by assuming that the joint distribution of the predictor and the response variables is a finite mixture of distributions. The new method is compared through a simulation study to some classical methods.  相似文献   

2.
Andreas Artemiou 《Statistics》2013,47(5):1037-1051
In this paper, we combine adaptively weighted large margin classifiers with Support Vector Machine (SVM)-based dimension reduction methods to create dimension reduction methods robust to the presence of extreme outliers. We discuss estimation and asymptotic properties of the algorithm. The good performance of the new algorithm is demonstrated through simulations and real data analysis.  相似文献   

3.
Abstract

In this note, we present a theoretical result which relaxes a critical condition required by the semiparametric approach to dimension reduction. The asymptotic normality of the estimators still maintains under weaker assumptions. This improvement greatly increases the applicability of the semiparametric approach.  相似文献   

4.
5.
6.
Summary.  Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data.  相似文献   

7.
We present a novel approach to sufficient dimension reduction for the conditional kth moments in regression. The approach provides a computationally feasible test for the dimension of the central kth-moment subspace. In addition, we can test predictor effects without assuming any models. All test statistics proposed in the novel approach have asymptotic chi-squared distributions.  相似文献   

8.
The analysis of high-dimensional data often begins with the identification of lower dimensional subspaces. Principal component analysis is a dimension reduction technique that identifies linear combinations of variables along which most variation occurs or which best “reconstruct” the original variables. For example, many temperature readings may be taken in a production process when in fact there are just a few underlying variables driving the process. A problem with principal components is that the linear combinations can seem quite arbitrary. To make them more interpretable, we introduce two classes of constraints. In the first, coefficients are constrained to equal a small number of values (homogeneity constraint). The second constraint attempts to set as many coefficients to zero as possible (sparsity constraint). The resultant interpretable directions are either calculated to be close to the original principal component directions, or calculated in a stepwise manner that may make the components more orthogonal. A small dataset on characteristics of cars is used to introduce the techniques. A more substantial data mining application is also given, illustrating the ability of the procedure to scale to a very large number of variables.  相似文献   

9.
10.
We discuss the covariate dimension reduction properties of conditional density ratios in the estimation of balanced contrasts of expectations. Conditional density ratios, as well as related sufficient summaries, can be used to replace the covariates with a smaller number of variables. For example, for comparisons among k   populations the covariates can be replaced with k-1k-1 conditional density ratios. The dimension reduction properties of conditional density ratios are directly connected with sufficiency, the dimension reduction concepts considered in regression theory, and propensity theory. The theory presented here extends the ideas in propensity theory to situations in which propensities do not exist and develops an approach to dimension reduction outside of the potential outcomes or counterfactual framework. Under general conditions, we show that a principal components transformation of the estimated conditional density ratios can be used to investigate whether a sufficient summary of dimension lower than k-1k-1 exists and to identify such a lower dimensional summary.  相似文献   

11.
In this paper, we perform an empirical comparison of the classification error of several ensemble methods based on classification trees. This comparison is performed by using 14 data sets that are publicly available and that were used by Lim, Loh and Shih [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.]. The methods considered are a single tree, Bagging, Boosting (Arcing) and random forests (RF). They are compared from different perspectives. More precisely, we look at the effects of noise and of allowing linear combinations in the construction of the trees, the differences between some splitting criteria and, specifically for RF, the effect of the number of variables from which to choose the best split at each given node. Moreover, we compare our results with those obtained by Lim et al. [Lim, T., Loh, W. and Shih, Y.-S., 2000, A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.]. In this study, the best overall results are obtained with RF. In particular, RF are the most robust against noise. The effect of allowing linear combinations and the differences between splitting criteria are small on average, but can be substantial for some data sets.  相似文献   

12.
A new method of statistical classification (discrimination) is proposed. The method is most effective for high dimension, low sample size data. It uses a robust mean difference as the direction vector and locates the classification boundary by minimizing the error rates. Asymptotic results for assessment and comparison to several popular methods are obtained by using a type of asymptotics of finite sample size and infinite dimensions. The value of the proposed approach is demonstrated by simulations. Real data examples are used to illustrate the performance of different classification methods.  相似文献   

13.
In this paper, we propose several dimension reduction methods when the covariates are measured with additive distortion measurement errors. These distortions are modelled by unknown functions of a commonly observable confounding variable. To estimate the central subspace, we propose residuals-based dimension reduction estimation methods and direct estimation methods. The consistency and asymptotic normality of the proposed estimators are investigated. Furthermore, we conduct some simulations to evaluate the performance of our proposed method and compare with existing methods, and a real data set is analysed for illustration.  相似文献   

14.
Sufficient dimension reduction (SDR) is a popular supervised machine learning technique that reduces the predictor dimension and facilitates subsequent data analysis in practice. In this article, we propose principal weighted logistic regression (PWLR), an efficient SDR method in binary classification where inverse-regression-based SDR methods often suffer. We first develop linear PWLR for linear SDR and study its asymptotic properties. We then extend it to nonlinear SDR and propose the kernel PWLR. Evaluations with both simulated and real data show the promising performance of the PWLR for SDR in binary classification.  相似文献   

15.
A new estimation method for the dimension of a regression at the outset of an analysis is proposed. A linear subspace spanned by projections of the regressor vector X , which contains part or all of the modelling information for the regression of a vector Y on X , and its dimension are estimated via the means of parametric inverse regression. Smooth parametric curves are fitted to the p inverse regressions via a multivariate linear model. No restrictions are placed on the distribution of the regressors. The estimate of the dimension of the regression is based on optimal estimation procedures. A simulation study shows the method to be more powerful than sliced inverse regression in some situations.  相似文献   

16.
The idea of dimension reduction without loss of information can be quite helpful for guiding the construction of summary plots in regression without requiring a prespecified model. Central subspaces are designed to capture all the information for the regression and to provide a population structure for dimension reduction. Here, we introduce the central k th-moment subspace to capture information from the mean, variance and so on up to the k th conditional moment of the regression. New methods are studied for estimating these subspaces. Connections with sliced inverse regression are established, and examples illustrating the theory are presented.  相似文献   

17.
In this article, a new method named cumulative slicing principle fitted component (CUPFC) model is proposed to conduct sufficient dimension reduction and prediction in regression. Based on the classical PFC methods, the CUPFC avoids selecting some parameters such as the specific basis function form or the number of slices in slicing estimation. We develop the estimator of the central subspace in the CUPFC method under three error-term structures and establish its consistency. The simulations investigate the effectiveness of the new method in prediction and reduction estimation with other competitors. The results indicate that the new proposed method generally outperforms the existing PFC methods no matter how the predictors are truly related to the response. The application to real data also verifies the validity of the proposed method.  相似文献   

18.
Statistical learning is emerging as a promising field where a number of algorithms from machine learning are interpreted as statistical methods and vice-versa. Due to good practical performance, boosting is one of the most studied machine learning techniques. We propose algorithms for multivariate density estimation and classification. They are generated by using the traditional kernel techniques as weak learners in boosting algorithms. Our algorithms take the form of multistep estimators, whose first step is a standard kernel method. Some strategies for bandwidth selection are also discussed with regard both to the standard kernel density classification problem, and to our 'boosted' kernel methods. Extensive experiments, using real and simulated data, show an encouraging practical relevance of the findings. Standard kernel methods are often outperformed by the first boosting iterations and in correspondence of several bandwidth values. In addition, the practical effectiveness of our classification algorithm is confirmed by a comparative study on two real datasets, the competitors being trees including AdaBoosting with trees.  相似文献   

19.
In this article, we propose a new method for sufficient dimension reduction when both response and predictor are vectors. The new method, using distance covariance, keeps the model-free advantage, and can fully recover the central subspace even when many predictors are discrete. We then extend this method to the dual central subspace, including a special case of canonical correlation analysis. We illustrated estimators through extensive simulations and real datasets, and compared to some existing methods, showing that our estimators are competitive and robust.  相似文献   

20.
Dimension reduction with bivariate responses, especially a mix of a continuous and categorical responses, can be of special interest. One immediate application is to regressions with censoring. In this paper, we propose two novel methods to reduce the dimension of the covariates of a bivariate regression via a model-free approach. Both methods enjoy a simple asymptotic chi-squared distribution for testing the dimension of the regression, and also allow us to test the contributions of the covariates easily without pre-specifying a parametric model. The new methods outperform the current one both in simulations and in analysis of a real data. The well-known PBC data are used to illustrate the application of our method to censored regression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号