共查询到20条相似文献,搜索用时 140 毫秒
1.
Sliced inverse regression (SIR) is an effective method for dimensionality reduction in high-dimensional regression problems. However, the method has requirements on the distribution of the predictors that are hard to check since they depend on unobserved variables. It has been shown that, if the distribution of the predictors is elliptical, then these requirements are satisfied. In case of mixture models, the ellipticity is violated and in addition there is no assurance of a single underlying regression model among the different components. Our approach clusterizes the predictors space to force the condition to hold on each cluster and includes a merging technique to look for different underlying models in the data. A study on simulated data as well as two real applications are provided. It appears that SIR, unsurprisingly, is not capable of dealing with a mixture of Gaussians involving different underlying models whereas our approach is able to correctly investigate the mixture. 相似文献
2.
Sliced Inverse Regression (SIR) is an effective method for dimension reduction in high-dimensional regression problems. The
original method, however, requires the inversion of the predictors covariance matrix. In case of collinearity between these
predictors or small sample sizes compared to the dimension, the inversion is not possible and a regularization technique has
to be used. Our approach is based on a Fisher Lecture given by R.D. Cook where it is shown that SIR axes can be interpreted
as solutions of an inverse regression problem. We propose to introduce a Gaussian prior distribution on the unknown parameters
of the inverse regression problem in order to regularize their estimation. We show that some existing SIR regularizations
can enter our framework, which permits a global understanding of these methods. Three new priors are proposed leading to new
regularizations of the SIR method. A comparison on simulated data as well as an application to the estimation of Mars surface
physical properties from hyperspectral images are provided. 相似文献
3.
Jae Keun Yoo 《Statistics》2018,52(2):409-425
In this paper, a model-based approach to reduce the dimension of response variables in multivariate regression is newly proposed, following the existing context of the response dimension reduction developed by Yoo and Cook [Response dimension reduction for the conditional mean in multivariate regression. Comput Statist Data Anal. 2008;53:334–343]. The related dimension reduction subspace is estimated by maximum likelihood, assuming an additive error. In the new approach, the linearity condition, which is assumed for the methodological development in Yoo and Cook (2008), is understood through the covariance matrix of the random error. Numerical studies show potential advantages of the proposed approach over Yoo and Cook (2008). A real data example is presented for illustration. 相似文献
4.
Jae Keun Yoo 《Journal of Statistical Computation and Simulation》2013,83(1):191-201
We present a novel approach to sufficient dimension reduction for the conditional kth moments in regression. The approach provides a computationally feasible test for the dimension of the central kth-moment subspace. In addition, we can test predictor effects without assuming any models. All test statistics proposed in the novel approach have asymptotic chi-squared distributions. 相似文献
5.
Kofi Placid Adragni 《Journal of applied statistics》2015,42(2):347-359
We present a methodology for screening predictors that, given the response, follow a one-parameter exponential family distributions. Screening predictors can be an important step in regressions when the number of predictors p is excessively large or larger than n the number of observations. We consider instances where a large number of predictors are suspected irrelevant for having no information about the response. The proposed methodology helps remove these irrelevant predictors while capturing those linearly or nonlinearly related to the response. 相似文献
6.
《Journal of nonparametric statistics》2012,24(4):1049-1071
ABSTRACTEstimating an inverse regression space is especially important in sufficient dimension reduction. However, it typically requires a tuning parameter, such as the number of slices in a slicing method or bandwidth selection in a kernel estimation approach. Such a requirement not only affects the accuracy of estimates in a finite sample, but also increases difficulties for multivariate models. In this paper, we use a Fourier transform approach to avoid such difficulties and incorporate multivariate models. We further develop a Fourier transform approach to deal with variable selection, categorical predictor variables, and large p, small n data. To test the dimension, asymptotic results are obtained. Simulation studies and data analysis show the efficacy of our proposed methods. 相似文献
7.
Li-Ping Zhu 《统计学通讯:理论与方法》2013,42(1):84-95
In the area of sufficient dimension reduction, two structural conditions are often assumed: the linearity condition that is close to assuming ellipticity of underlying distribution of predictors, and the constant variance condition that nears multivariate normality assumption of predictors. Imposing these conditions are considered as necessary trade-off for overcoming the “curse of dimensionality”. However, it is very hard to check whether these conditions hold or not. When these conditions are violated, some methods such as marginal transformation and re-weighting are suggested so that data fulfill them approximately. In this article, we assume an independence condition between the projected predictors and their orthogonal complements which can ensure the commonly used inverse regression methods to identify the central subspace of interest. The independence condition can be checked by the gridded chi-square test. Thus, we extend the scope of many inverse regression methods and broaden their applicability in the literature. Simulation studies and an application to the car price data are presented for illustration. 相似文献
8.
Sufficient dimension reduction (SDR ) has been shown to be a powerful statistical method that is able to reduce the dimension of covariates without losing information with respect to the response. Subsequent analysis can then be based on a lower dimensional transformations of covariates, which has the potential to assist model building and to increase the estimation efficiency. In some situations, the additional information could be also available during the data collection process. Although one can proceed with the conventional method, properly utilizing the additional information can greatly improve making statistical inference. It is thus of interest to incorporate the additional information into the practice of SDR methods. In this article, we review the generalizations of SDR methods that are able to utilize different types of the additional information. One will see that, depending on the sources of the additional information, different techniques are required to modify conventional SDR methods to improve estimating the target of interest. WIREs Comput Stat 2017, 9:e1401. doi: 10.1002/wics.1401 This article is categorized under:
- Applications of Computational Statistics > Computational Mathematics
- Applications of Computational Statistics > Computational and Molecular Biology
- Statistical and Graphical Methods of Data Analysis > Dimension Reduction
9.
Efstathia Bura & R. Dennis Cook 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2001,63(2):393-410
A new estimation method for the dimension of a regression at the outset of an analysis is proposed. A linear subspace spanned by projections of the regressor vector X , which contains part or all of the modelling information for the regression of a vector Y on X , and its dimension are estimated via the means of parametric inverse regression. Smooth parametric curves are fitted to the p inverse regressions via a multivariate linear model. No restrictions are placed on the distribution of the regressors. The estimate of the dimension of the regression is based on optimal estimation procedures. A simulation study shows the method to be more powerful than sliced inverse regression in some situations. 相似文献
10.
In this paper, we introduce linear modeling of canonical correlation analysis, which estimates canonical direction matrices by minimising a quadratic objective function. The linear modeling results in a class of estimators of canonical direction matrices, and an optimal class is derived in the sense described herein. The optimal class guarantees several of the following desirable advantages: first, its estimates of canonical direction matrices are asymptotically efficient; second, its test statistic for determining the number of canonical covariates always has a chi‐squared distribution asymptotically; third, it is straight forward to construct tests for variable selection. The standard canonical correlation analysis and other existing methods turn out to be suboptimal members of the class. Finally, we study the role of canonical variates as a means of dimension reduction for predictors and responses in multivariate regression. Numerical studies and data analysis are presented. 相似文献
11.
Lei Wang 《Journal of nonparametric statistics》2017,29(3):594-614
To make efficient inference for mean of a response variable when the data are missing at random and the dimension of covariate is not low, we construct three bias-corrected empirical likelihood (EL) methods in conjunction with dimension-reduced kernel estimation of propensity or/and conditional mean response function. Consistency and asymptotic normality of the maximum dimension-reduced EL estimators are established. We further study the asymptotic properties of the resulting dimension-reduced EL ratio functions and the corresponding EL confidence intervals for the response mean are constructed. The finite-sample performance of the proposed estimators is studied through simulation, and an application to HIV-CD4 data set is also presented. 相似文献
12.
Jae Keun Yoo 《Statistics》2016,50(5):1086-1099
The purpose of this paper is to define the central informative predictor subspace to contain the central subspace and to develop methods for estimating the former subspace. Potential advantages of the proposed methods are no requirements of linearity, constant variance and coverage conditions in methodological developments. Therefore, the central informative predictor subspace gives us the benefit of restoring the central subspace exhaustively despite failing the conditions. Numerical studies confirm the theories, and real data analyses are presented. 相似文献
13.
Howard D. Bondell Lexin Li 《Journal of the Royal Statistical Society. Series B, Statistical methodology》2009,71(1):287-299
Summary. The family of inverse regression estimators that was recently proposed by Cook and Ni has proven effective in dimension reduction by transforming the high dimensional predictor vector to its low dimensional projections. We propose a general shrinkage estimation strategy for the entire inverse regression estimation family that is capable of simultaneous dimension reduction and variable selection. We demonstrate that the new estimators achieve consistency in variable selection without requiring any traditional model, meanwhile retaining the root n estimation consistency of the dimension reduction basis. We also show the effectiveness of the new estimators through both simulation and real data analysis. 相似文献
14.
Based on the theories of sliced inverse regression (SIR) and reproducing kernel Hilbert space (RKHS), a new approach RDSIR (RKHS-based Double SIR) to nonlinear dimension reduction for survival data is proposed. An isometric isomorphism is constructed based on the RKHS property, then the nonlinear function in the RKHS can be represented by the inner product of two elements that reside in the isomorphic feature space. Due to the censorship of survival data, double slicing is used to estimate the weight function to adjust for the censoring bias. The nonlinear sufficient dimension reduction (SDR) subspace is estimated by a generalized eigen-decomposition problem. The asymptotic property of the estimator is established based on the perturbation theory. Finally, the performance of RDSIR is illustrated on simulated and real data. The numerical results show that RDSIR is comparable with the linear SDR method. Most importantly, RDSIR can also effectively extract nonlinearity from survival data. 相似文献
15.
Motivated from problems in canonical correlation analysis, reduced rank regression and sufficient dimension reduction, we introduce a double dimension reduction model where a single index of the multivariate response is linked to the multivariate covariate through a single index of these covariates, hence the name double single index model. Because nonlinear association between two sets of multivariate variables can be arbitrarily complex and even intractable in general, we aim at seeking a principal one‐dimensional association structure where a response index is fully characterized by a single predictor index. The functional relation between the two single‐indices is left unspecified, allowing flexible exploration of any potential nonlinear association. We argue that such double single index association is meaningful and easy to interpret, and the rest of the multi‐dimensional dependence structure can be treated as nuisance in model estimation. We investigate the estimation and inference of both indices and the regression function, and derive the asymptotic properties of our procedure. We illustrate the numerical performance in finite samples and demonstrate the usefulness of the modelling and estimation procedure in a multi‐covariate multi‐response problem concerning concrete. 相似文献
16.
In this article, a new method named cumulative slicing principle fitted component (CUPFC) model is proposed to conduct sufficient dimension reduction and prediction in regression. Based on the classical PFC methods, the CUPFC avoids selecting some parameters such as the specific basis function form or the number of slices in slicing estimation. We develop the estimator of the central subspace in the CUPFC method under three error-term structures and establish its consistency. The simulations investigate the effectiveness of the new method in prediction and reduction estimation with other competitors. The results indicate that the new proposed method generally outperforms the existing PFC methods no matter how the predictors are truly related to the response. The application to real data also verifies the validity of the proposed method. 相似文献
17.
An envelope is a relatively new construct for decreasing estimative and predictive variation relative to standard methods in multivariate statistics, sometimes by amounts equivalent to increasing the sample size many times over. Essentially a form of targeted dimension reduction that is descendent from sufficient dimension reduction, an envelope inherits its underlying philosophy from Fisher's notion of sufficient statistics. The initial development of envelope methods took place largely in the context of the multivariate linear model, resulting in response envelopes for response reduction, predictor envelopes for predictor reduction, simultaneous envelopes for response and predictor reduction and partial envelopes for specialized considerations, each demonstrating a potential for substantial reduction in estimative variation. These advances demonstrated that there are close connections between envelopes and some standard multivariate methods like partial least squares regression and canonical correlations. Subsequently, envelope methodology has been adapted and extended to diverse areas, including envelopes for regressions with matrix and tensor‐valued responses, envelopes for spatial statistics, quantile envelopes for quantile regression, Bayesian response envelopes. Sparse versions of response and predictor envelopes have also been developed. More generally, there is also envelope methodology for reducing the variation in any asymptotically normal vector‐valued estimator. These advances have opened a new chapter in multivariate statistics, allowing variation that is material to the goals of an analysis to be separated effectively from immaterial variation that serves only to confound estimation and prediction. This article is categorized under:
- Statistical Models > Multivariate Models
- Statistical and Graphical Methods of Data Analysis > Multivariate Analysis
- Statistical and Graphical Methods of Data Analysis > Dimension Reduction
18.
Variable selection is a very important tool when dealing with high dimensional data. However, most popular variable selection methods are model based, which might provide misleading results when the model assumption is not satisfied. Sufficient dimension reduction provides a general framework for model-free variable selection methods. In this paper, we propose a model-free variable selection method via sufficient dimension reduction, which incorporates the grouping information into the selection procedure for multi-population data. Theoretical properties of our selection methods are also discussed. Simulation studies suggest that our method greatly outperforms those ignoring the grouping information. 相似文献
19.
20.
Wenbo Wu Haileab Hilafu Yuan Xue 《Journal of Statistical Computation and Simulation》2019,89(12):2354-2372
Estimation of a general multi-index model comprises determining the number of linear combinations of predictors (structural dimension) that are related to the response, estimating the loadings of each index vector, selecting the active predictors and estimating the underlying link function. These objectives are often achieved sequentially at different stages of the estimation process. In this study, we propose a unified estimation approach under a semi-parametric model framework to attain these estimation goals simultaneously. The proposed estimation method is more efficient and stable than many existing methods where the estimation error in the structural dimension may propagate to the estimation of the index vectors and variable selection stages. A detailed algorithm is provided to implement the proposed method. Comprehensive simulations and a real data analysis illustrate the effectiveness of the proposed method. 相似文献