首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 921 毫秒
1.

Parameter reduction can enable otherwise infeasible design and uncertainty studies with modern computational science models that contain several input parameters. In statistical regression, techniques for sufficient dimension reduction (SDR) use data to reduce the predictor dimension of a regression problem. A computational scientist hoping to use SDR for parameter reduction encounters a problem: a computer prediction is best represented by a deterministic function of the inputs, so data comprised of computer simulation queries fail to satisfy the SDR assumptions. To address this problem, we interpret SDR methods sliced inverse regression (SIR) and sliced average variance estimation (SAVE) as estimating the directions of a ridge function, which is a composition of a low-dimensional linear transformation with a nonlinear function. Within this interpretation, SIR and SAVE estimate matrices of integrals whose column spaces are contained in the ridge directions’ span; we analyze and numerically verify convergence of these column spaces as the number of computer model queries increases. Moreover, we show example functions that are not ridge functions but whose inverse conditional moment matrices are low-rank. Consequently, the computational scientist should beware when using SIR and SAVE for parameter reduction, since SIR and SAVE may mistakenly suggest that truly important directions are unimportant.

  相似文献   

2.

Sufficient dimension reduction (SDR) provides a framework for reducing the predictor space dimension in statistical regression problems. We consider SDR in the context of dimension reduction for deterministic functions of several variables such as those arising in computer experiments. In this context, SDR can reveal low-dimensional ridge structure in functions. Two algorithms for SDR—sliced inverse regression (SIR) and sliced average variance estimation (SAVE)—approximate matrices of integrals using a sliced mapping of the response. We interpret this sliced approach as a Riemann sum approximation of the particular integrals arising in each algorithm. We employ the well-known tools from numerical analysis—namely, multivariate numerical integration and orthogonal polynomials—to produce new algorithms that improve upon the Riemann sum-based numerical integration in SIR and SAVE. We call the new algorithms Lanczos–Stieltjes inverse regression (LSIR) and Lanczos–Stieltjes average variance estimation (LSAVE) due to their connection with Stieltjes’ method—and Lanczos’ related discretization—for generating a sequence of polynomials that are orthogonal with respect to a given measure. We show that this approach approximates the desired integrals, and we study the behavior of LSIR and LSAVE with two numerical examples. The quadrature-based LSIR and LSAVE eliminate the first-order algebraic convergence rate bottleneck resulting from the Riemann sum approximation, thus enabling high-order numerical approximations of the integrals when appropriate. Moreover, LSIR and LSAVE perform as well as the best-case SIR and SAVE implementations (e.g., adaptive partitioning of the response space) when low-order numerical integration methods (e.g., simple Monte Carlo) are used.

  相似文献   

3.
Sliced inverse regression (SIR) was developed to find effective linear dimension-reduction directions for exploring the intrinsic structure of the high-dimensional data. In this study, we present isometric SIR for nonlinear dimension reduction, which is a hybrid of the SIR method using the geodesic distance approximation. First, the proposed method computes the isometric distance between data points; the resulting distance matrix is then sliced according to K-means clustering results, and the classical SIR algorithm is applied. We show that the isometric SIR (ISOSIR) can reveal the geometric structure of a nonlinear manifold dataset (e.g., the Swiss roll). We report and discuss this novel method in comparison to several existing dimension-reduction techniques for data visualization and classification problems. The results show that ISOSIR is a promising nonlinear feature extractor for classification applications.  相似文献   

4.
Sufficient dimension reduction (SDR) is a popular supervised machine learning technique that reduces the predictor dimension and facilitates subsequent data analysis in practice. In this article, we propose principal weighted logistic regression (PWLR), an efficient SDR method in binary classification where inverse-regression-based SDR methods often suffer. We first develop linear PWLR for linear SDR and study its asymptotic properties. We then extend it to nonlinear SDR and propose the kernel PWLR. Evaluations with both simulated and real data show the promising performance of the PWLR for SDR in binary classification.  相似文献   

5.
Traditionally, time series analysis involves building an appropriate model and using either parametric or nonparametric methods to make inference about the model parameters. Motivated by recent developments for dimension reduction in time series, an empirical application of sufficient dimension reduction (SDR) to nonlinear time series modelling is shown in this article. Here, we use time series central subspace as a tool for SDR and estimate it using mutual information index. Especially, in order to reduce the computational complexity in time series, we propose an efficient estimation method of minimal dimension and lag using a modified Schwarz–Bayesian criterion, when either of the dimensions and the lags is unknown. Through simulations and real data analysis, the approach presented in this article performs well in autoregression and volatility estimation.  相似文献   

6.
Abstract

A simple method based on sliced inverse regression (SIR) is proposed to explore an effective dimension reduction (EDR) vector for the single index model. We avoid the principle component analysis step of the original SIR by using two sample mean vectors in two slices of the response variable and their difference vector. The theories become simpler, the method is equivalent to the multiple linear regression with dichotomized response, and the estimator can be expressed by a closed form, although the objective function might be an unknown nonlinear. It can be applied for the case when the number of covariates is large, and it requires no matrix operation or iterative calculation.  相似文献   

7.
Sliced Inverse Regression (SIR) is an effective method for dimension reduction in high-dimensional regression problems. The original method, however, requires the inversion of the predictors covariance matrix. In case of collinearity between these predictors or small sample sizes compared to the dimension, the inversion is not possible and a regularization technique has to be used. Our approach is based on a Fisher Lecture given by R.D. Cook where it is shown that SIR axes can be interpreted as solutions of an inverse regression problem. We propose to introduce a Gaussian prior distribution on the unknown parameters of the inverse regression problem in order to regularize their estimation. We show that some existing SIR regularizations can enter our framework, which permits a global understanding of these methods. Three new priors are proposed leading to new regularizations of the SIR method. A comparison on simulated data as well as an application to the estimation of Mars surface physical properties from hyperspectral images are provided.  相似文献   

8.
We propose a new method for dimension reduction in regression using the first two inverse moments. We develop corresponding weighted chi-squared tests for the dimension of the regression. The proposed method considers linear combinations of sliced inverse regression (SIR) and the method using a new candidate matrix which is designed to recover the entire inverse second moment subspace. The optimal combination may be selected based on the p-values derived from the dimension tests. Theoretically, the proposed method, as well as sliced average variance estimate (SAVE), is more capable of recovering the complete central dimension reduction subspace than SIR and principle Hessian directions (pHd). Therefore it can substitute for SIR, pHd, SAVE, or any linear combination of them at a theoretical level. Simulation study indicates that the proposed method may have consistently greater power than SIR, pHd, and SAVE.  相似文献   

9.
In this paper, we consider the ultrahigh-dimensional sufficient dimension reduction (SDR) for censored data and measurement error in covariates. We first propose the feature screening procedure based on censored data and the covariates subject to measurement error. With the suitable correction of mismeasurement, the error-contaminated variables detected by the proposed feature screening procedure are the same as the truly important variables. Based on the selected active variables, we develop the SDR method to estimate the central subspace and the structural dimension with both censored data and measurement error incorporated. The theoretical results of the proposed method are established. Simulation studies are reported to assess the performance of the proposed method. The proposed method is implemented to NKI breast cancer data.  相似文献   

10.
Sliced Inverse Regression (SIR) is a promising technique for the purpose of dimension reduction. Several properties of this method have been examined already, but little attention has been paid to robustness aspects. In this article, we focus on the sensitivity of SIR to outliers and show in what sense and how severely SIR can be influenced by outliers in the data.  相似文献   

11.

We propose a semiparametric framework based on sliced inverse regression (SIR) to address the issue of variable selection in functional regression. SIR is an effective method for dimension reduction which computes a linear projection of the predictors in a low-dimensional space, without loss of information on the regression. In order to deal with the high dimensionality of the predictors, we consider penalized versions of SIR: ridge and sparse. We extend the approaches of variable selection developed for multidimensional SIR to select intervals that form a partition of the definition domain of the functional predictors. Selecting entire intervals rather than separated evaluation points improves the interpretability of the estimated coefficients in the functional framework. A fully automated iterative procedure is proposed to find the critical (interpretable) intervals. The approach is proved efficient on simulated and real data. The method is implemented in the R package SISIR available on CRAN at https://cran.r-project.org/package=SISIR.

  相似文献   

12.
We propose a flexible functional approach for modelling generalized longitudinal data and survival time using principal components. In the proposed model the longitudinal observations can be continuous or categorical data, such as Gaussian, binomial or Poisson outcomes. We generalize the traditional joint models that treat categorical data as continuous data by using some transformations, such as CD4 counts. The proposed model is data-adaptive, which does not require pre-specified functional forms for longitudinal trajectories and automatically detects characteristic patterns. The longitudinal trajectories observed with measurement error or random error are represented by flexible basis functions through a possibly nonlinear link function, combining dimension reduction techniques resulting from functional principal component (FPC) analysis. The relationship between the longitudinal process and event history is assessed using a Cox regression model. Although the proposed model inherits the flexibility of non-parametric methods, the estimation procedure based on the EM algorithm is still parametric in computation, and thus simple and easy to implement. The computation is simplified by dimension reduction for random coefficients or FPC scores. An iterative selection procedure based on Akaike information criterion (AIC) is proposed to choose the tuning parameters, such as the knots of spline basis and the number of FPCs, so that appropriate degree of smoothness and fluctuation can be addressed. The effectiveness of the proposed approach is illustrated through a simulation study, followed by an application to longitudinal CD4 counts and survival data which were collected in a recent clinical trial to compare the efficiency and safety of two antiretroviral drugs.  相似文献   

13.
Sliced Inverse Regression (SIR; 1991) is a dimension reduction method for reducing the dimension of the predictors without losing regression information. The implementation of SIR requires inverting the covariance matrix of the predictors—which has hindered its use to analyze high-dimensional data where the number of predictors exceed the sample size. We propose random sliced inverse regression (rSIR) by applying SIR to many bootstrap samples, each using a subset of randomly selected candidate predictors. The final rSIR estimate is obtained by aggregating these estimates. A simple variable selection procedure is also proposed using these bootstrap estimates. The performance of the proposed estimates is studied via extensive simulation. Application to a dataset concerning myocardial perfusion diagnosis from cardiac Single Proton Emission Computed Tomography (SPECT) images is presented.  相似文献   

14.
15.
The detection of influential observations on the estimation of the dimension reduction subspace returned by Sliced Inverse Regression (SIR) is considered. Although there are many measures to detect influential observations in related methods such as multiple linear regression, there has been little development in this area with respect to dimension reduction. One particular influence measure for a version of SIR is examined and it is shown, via simulation and example, how this may be used to detect influential observations in practice.  相似文献   

16.
马少沛等 《统计研究》2021,38(2):114-134
在大数据时代,金融学、基因组学和图像处理等领域产生了大量的张量数据。Zhong等(2015)提出了张量充分降维方法,并给出了处理二阶张量的序列迭代算法。鉴于高阶张量在实际生活中的广泛应用,本文将Zhong等(2015)的算法推广到高阶,以三阶张量为例,提出了两种不同的算法:结构转换算法和结构保持算法。两种算法都能够在不同程度上保持张量原有结构信息,同时有效降低变量维度和计算复杂度,避免协方差矩阵奇异的问题。将两种算法应用于人像彩图的分类识别,以二维和三维点图等形式直观展现了算法分类结果。将本文的结构保持算法与K-means聚类方法、t-SNE非线性降维方法、多维主成分分析、多维判别分析和张量切片逆回归共五种方法进行对比,结果表明本文所提方法在分类精度方面有明显优势,因此在图像识别及相关应用领域具有广阔的发展前景。  相似文献   

17.
Wavelet kernel penalized estimation for non-equispaced design regression   总被引:2,自引:0,他引:2  
The paper considers regression problems with univariate design points. The design points are irregular and no assumptions on their distribution are imposed. The regression function is retrieved by a wavelet based reproducing kernel Hilbert space (RKHS) technique with the penalty equal to the sum of blockwise RKHS norms. In order to simplify numerical optimization, the problem is replaced by an equivalent quadratic minimization problem with an additional penalty term. The computational algorithm is described in detail and is implemented with both the sets of simulated and real data. Comparison with existing methods showed that the technique suggested in the paper does not oversmooth the function and is superior in terms of the mean squared error. It is also demonstrated that under additional assumptions on design points the method achieves asymptotic optimality in a wide range of Besov spaces.  相似文献   

18.
Jing Yang  Fang Lu  Hu Yang 《Statistics》2013,47(6):1193-1211
The outer product of gradients (OPG) estimation procedure based on least squares (LS) approach has been presented by Xia et al. [An adaptive estimation of dimension reduction space. J Roy Statist Soc Ser B. 2002;64:363–410] to estimate the single-index parameter in partially linear single-index models (PLSIM). However, its asymptotic property has not been established yet and the efficiency of LS-based method can be significantly affected by outliers and heavy-tailed distributions. In this paper, we firstly derive the asymptotic property of OPG estimator developed by Xia et al. [An adaptive estimation of dimension reduction space. J Roy Statist Soc Ser B. 2002;64:363–410] in theory, and a novel robust estimation procedure combining the ideas of OPG and local rank (LR) inference is further developed for PLSIM along with its theoretical property. Then, we theoretically derive the asymptotic relative efficiency (ARE) of the proposed LR-based procedure with respect to LS-based method, which is shown to possess an expression that is closely related to that of the signed-rank Wilcoxon test in comparison with the t-test. Moreover, we demonstrate that the new proposed estimator has a great efficiency gain across a wide spectrum of non-normal error distributions and almost not lose any efficiency for the normal error. Even in the worst case scenarios, the ARE owns a lower bound equalling to 0.864 for estimating the single-index parameter and a lower bound being 0.8896 for estimating the nonparametric function respectively, versus the LS-based estimators. Finally, some Monte Carlo simulations and a real data analysis are conducted to illustrate the finite sample performance of the estimators.  相似文献   

19.
To characterize the dependence of a response on covariates of interest, a monotonic structure is linked to a multivariate polynomial transformation of the central subspace (CS) directions with unknown structural degree and dimension. Under a very general semiparametric model formulation, such a sufficient dimension reduction (SDR) score is shown to enjoy the existence, optimality, and uniqueness up to scale and location in the defined concordance probability function. In light of these properties and its single-index representation, two types of concordance-based generalized Bayesian information criteria are constructed to estimate the optimal SDR score and the maximum concordance index. The estimation criteria are further carried out by effective computational procedures. Generally speaking, the outer product of gradients estimation in the first approach has an advantage in computational efficiency and the parameterization system in the second approach greatly reduces the number of parameters in estimation. Different from most existing SDR approaches, only one CS direction is required to be continuous in the proposals. Moreover, the consistency of structural degree and dimension estimators and the asymptotic normality of the optimal SDR score and maximum concordance index estimators are established under some suitable conditions. The performance and practicality of our methodology are also investigated through simulations and empirical illustrations.  相似文献   

20.
The aim of this paper is to define and develop diagnostic measures with respect to kernel ridge regression in a reproducing kernel Hilbert space (RKHS). To identify influential observations, we define a particular version of Cook’s distance for the kernel ridge regression model in RKHS, which is conceptually consistent with Cook’s distance in a classical regression model. Then, by using the perturbation formula for the regularized conditional expectation of the outcome in RKHS, we develop an approximate version of Cook”s distance in RKHS because the original definition requires intensive computations. Such an approximated Cook”s distance is represented in terms of basic building blocks such as residuals and leverages of the kernel ridge regression. The results of the simulation and real application demonstrate that our diagnostic measure successfully detects potentially influential observations on estimators in kernel ridge regression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号