首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Sliced Inverse Regression (SIR) is an effective method for dimension reduction in high-dimensional regression problems. The original method, however, requires the inversion of the predictors covariance matrix. In case of collinearity between these predictors or small sample sizes compared to the dimension, the inversion is not possible and a regularization technique has to be used. Our approach is based on a Fisher Lecture given by R.D. Cook where it is shown that SIR axes can be interpreted as solutions of an inverse regression problem. We propose to introduce a Gaussian prior distribution on the unknown parameters of the inverse regression problem in order to regularize their estimation. We show that some existing SIR regularizations can enter our framework, which permits a global understanding of these methods. Three new priors are proposed leading to new regularizations of the SIR method. A comparison on simulated data as well as an application to the estimation of Mars surface physical properties from hyperspectral images are provided.  相似文献   

2.
We propose a new method for dimension reduction in regression using the first two inverse moments. We develop corresponding weighted chi-squared tests for the dimension of the regression. The proposed method considers linear combinations of sliced inverse regression (SIR) and the method using a new candidate matrix which is designed to recover the entire inverse second moment subspace. The optimal combination may be selected based on the p-values derived from the dimension tests. Theoretically, the proposed method, as well as sliced average variance estimate (SAVE), is more capable of recovering the complete central dimension reduction subspace than SIR and principle Hessian directions (pHd). Therefore it can substitute for SIR, pHd, SAVE, or any linear combination of them at a theoretical level. Simulation study indicates that the proposed method may have consistently greater power than SIR, pHd, and SAVE.  相似文献   

3.

We propose a semiparametric framework based on sliced inverse regression (SIR) to address the issue of variable selection in functional regression. SIR is an effective method for dimension reduction which computes a linear projection of the predictors in a low-dimensional space, without loss of information on the regression. In order to deal with the high dimensionality of the predictors, we consider penalized versions of SIR: ridge and sparse. We extend the approaches of variable selection developed for multidimensional SIR to select intervals that form a partition of the definition domain of the functional predictors. Selecting entire intervals rather than separated evaluation points improves the interpretability of the estimated coefficients in the functional framework. A fully automated iterative procedure is proposed to find the critical (interpretable) intervals. The approach is proved efficient on simulated and real data. The method is implemented in the R package SISIR available on CRAN at https://cran.r-project.org/package=SISIR.

  相似文献   

4.
李向杰等 《统计研究》2018,35(7):115-124
经典的充分降维方法对解释变量存在异常值或者当其是厚尾分布时效果较差,为此,经过对充分降维理论中加权与累积切片的分析研究,本文提出了一种将两者有机结合的稳健降维方法:累积加权切片逆回归法(CWSIR)。该方法对自变量存在异常值以及小样本情况下表现比较稳健,并且有效避免了对切片数目的选择。数值模拟结果显示CWSIR要优于传统的切片逆回归(SIR)、累积切片估计(CUME)、基于等高线的切片逆回归估计(CPSIR)、加权典则相关估计(WCANCOR)、切片逆中位数估计(SIME)、加权逆回归估计(WIRE)等方法。最后,我们通过对某视频网站真实数据的分析也验证了CWSIR的有效性。  相似文献   

5.
Abstract.  Sliced inverse regression (SIR) is a dimension reduction technique that is both efficient and simple to implement. The procedure itself relies heavily on estimates that are known to be highly non-robust and, as such, the issue of robustness is often raised. This paper looks at the robustness of SIR by deriving and plotting the influence function for a variety of contamination structures. The sample influence function is also considered and used to highlight that common outlier detection and deletion methods may not be entirely useful to SIR. The asymptotic variance of the estimates is also derived for the single index model when the explanatory variable is known to be normally distributed. The asymptotic variance is then compared for varying choices of the number of slices for a simple model example.  相似文献   

6.
Abstract

A simple method based on sliced inverse regression (SIR) is proposed to explore an effective dimension reduction (EDR) vector for the single index model. We avoid the principle component analysis step of the original SIR by using two sample mean vectors in two slices of the response variable and their difference vector. The theories become simpler, the method is equivalent to the multiple linear regression with dichotomized response, and the estimator can be expressed by a closed form, although the objective function might be an unknown nonlinear. It can be applied for the case when the number of covariates is large, and it requires no matrix operation or iterative calculation.  相似文献   

7.
Sliced inverse regression (SIR) was developed to find effective linear dimension-reduction directions for exploring the intrinsic structure of the high-dimensional data. In this study, we present isometric SIR for nonlinear dimension reduction, which is a hybrid of the SIR method using the geodesic distance approximation. First, the proposed method computes the isometric distance between data points; the resulting distance matrix is then sliced according to K-means clustering results, and the classical SIR algorithm is applied. We show that the isometric SIR (ISOSIR) can reveal the geometric structure of a nonlinear manifold dataset (e.g., the Swiss roll). We report and discuss this novel method in comparison to several existing dimension-reduction techniques for data visualization and classification problems. The results show that ISOSIR is a promising nonlinear feature extractor for classification applications.  相似文献   

8.
Several variations of monotone nonparametric regression have been developed over the past 30 years. One approach is to first apply nonparametric regression to data and then monotone smooth the initial estimates to “iron out” violations to the assumed order. Here, such estimators are considered, where local polynomial regression is first used, followed by either least squares isotonic regression or a monotone method using simple averages. The primary focus of this work is to evaluate different types of confidence intervals for these monotone nonparametric regression estimators through Monte Carlo simulation. Most of the confidence intervals use bootstrap or jackknife procedures. Estimation of a response variable as a function of two continuous predictor variables is considered, where the estimation is performed at the observed values of the predictors (instead of on a grid). The methods are then applied to data involving subjects that worked at plants that use beryllium metal who have developed chronic beryllium disease.  相似文献   

9.
Recent work has shown that the presence of ties between an outcome event and the time that a binary covariate changes or jumps can lead to biased estimates of regression coefficients in the Cox proportional hazards model. One proposed solution is the Equally Weighted method. The coefficient estimate of the Equally Weighted method is defined to be the average of the coefficient estimates of the Jump Before Event method and the Jump After Event method, where these two methods assume that the jump always occurs before or after the event time, respectively. In previous work, the bootstrap method was used to estimate the standard error of the Equally Weighted coefficient estimate. However, the bootstrap approach was computationally intensive and resulted in overestimation. In this article, two new methods for the estimation of the Equally Weighted standard error are proposed. Three alternative methods for estimating both the regression coefficient and the corresponding standard error are also proposed. All the proposed methods are easy to implement. The five methods are investigated using a simulation study and are illustrated using two real datasets.  相似文献   

10.
A method for nonparametric estimation of density based on a randomly censored sample is presented. The density is expressed as a linear combination of cubic M -splines, and the coefficients are determined by pseudo-maximum-likelihood estimation (likelihood is maximized conditionally on data-dependent knots). By using regression splines (small number of knots) it is possible to reduce the estimation problem to a space of low dimension while preserving flexibility, thus striking a compromise between parametric approaches and ordinary nonparametric approaches based on spline smoothing. The number of knots is determined by the minimum AIC. Examples of simulated and real data are presented. Asymptotic theory and the bootstrap indicate that the precision and the accuracy of the estimates are satisfactory.  相似文献   

11.

Parameter reduction can enable otherwise infeasible design and uncertainty studies with modern computational science models that contain several input parameters. In statistical regression, techniques for sufficient dimension reduction (SDR) use data to reduce the predictor dimension of a regression problem. A computational scientist hoping to use SDR for parameter reduction encounters a problem: a computer prediction is best represented by a deterministic function of the inputs, so data comprised of computer simulation queries fail to satisfy the SDR assumptions. To address this problem, we interpret SDR methods sliced inverse regression (SIR) and sliced average variance estimation (SAVE) as estimating the directions of a ridge function, which is a composition of a low-dimensional linear transformation with a nonlinear function. Within this interpretation, SIR and SAVE estimate matrices of integrals whose column spaces are contained in the ridge directions’ span; we analyze and numerically verify convergence of these column spaces as the number of computer model queries increases. Moreover, we show example functions that are not ridge functions but whose inverse conditional moment matrices are low-rank. Consequently, the computational scientist should beware when using SIR and SAVE for parameter reduction, since SIR and SAVE may mistakenly suggest that truly important directions are unimportant.

  相似文献   

12.
Sliced inverse regression (SIR) is an effective method for dimensionality reduction in high-dimensional regression problems. However, the method has requirements on the distribution of the predictors that are hard to check since they depend on unobserved variables. It has been shown that, if the distribution of the predictors is elliptical, then these requirements are satisfied. In case of mixture models, the ellipticity is violated and in addition there is no assurance of a single underlying regression model among the different components. Our approach clusterizes the predictors space to force the condition to hold on each cluster and includes a merging technique to look for different underlying models in the data. A study on simulated data as well as two real applications are provided. It appears that SIR, unsurprisingly, is not capable of dealing with a mixture of Gaussians involving different underlying models whereas our approach is able to correctly investigate the mixture.  相似文献   

13.
Because sliced inverse regression (SIR) using the conditional mean of the inverse regression fails to recover the central subspace when the inverse regression mean degenerates, sliced average variance estimation (SAVE) using the conditional variance was proposed in the sufficient dimension reduction literature. However, the efficacy of SAVE depends heavily upon the number of slices. In the present article, we introduce a class of weighted variance estimation (WVE), which, similar to SAVE and simple contour regression (SCR), uses the conditional variance of the inverse regression to recover the central subspace. The strong consistency and the asymptotic normality of the kernel estimation of WVE are established under mild regularity conditions. Finite sample studies are carried out for comparison with existing methods and an application to a real data is presented for illustration.  相似文献   

14.
Based on the theories of sliced inverse regression (SIR) and reproducing kernel Hilbert space (RKHS), a new approach RDSIR (RKHS-based Double SIR) to nonlinear dimension reduction for survival data is proposed. An isometric isomorphism is constructed based on the RKHS property, then the nonlinear function in the RKHS can be represented by the inner product of two elements that reside in the isomorphic feature space. Due to the censorship of survival data, double slicing is used to estimate the weight function to adjust for the censoring bias. The nonlinear sufficient dimension reduction (SDR) subspace is estimated by a generalized eigen-decomposition problem. The asymptotic property of the estimator is established based on the perturbation theory. Finally, the performance of RDSIR is illustrated on simulated and real data. The numerical results show that RDSIR is comparable with the linear SDR method. Most importantly, RDSIR can also effectively extract nonlinearity from survival data.  相似文献   

15.
In this paper we define a finite mixture of quantile and M-quantile regression models for heterogeneous and /or for dependent/clustered data. Components of the finite mixture represent clusters of individuals with homogeneous values of model parameters. For its flexibility and ease of estimation, the proposed approaches can be extended to random coefficients with a higher dimension than the simple random intercept case. Estimation of model parameters is obtained through maximum likelihood, by implementing an EM-type algorithm. The standard error estimates for model parameters are obtained using the inverse of the observed information matrix, derived through the Oakes (J R Stat Soc Ser B 61:479–482, 1999) formula in the M-quantile setting, and through nonparametric bootstrap in the quantile case. We present a large scale simulation study to analyse the practical behaviour of the proposed model and to evaluate the empirical performance of the proposed standard error estimates for model parameters. We considered a variety of empirical settings in both the random intercept and the random coefficient case. The proposed modelling approaches are also applied to two well-known datasets which give further insights on their empirical behaviour.  相似文献   

16.
The detection of influential observations on the estimation of the dimension reduction subspace returned by Sliced Inverse Regression (SIR) is considered. Although there are many measures to detect influential observations in related methods such as multiple linear regression, there has been little development in this area with respect to dimension reduction. One particular influence measure for a version of SIR is examined and it is shown, via simulation and example, how this may be used to detect influential observations in practice.  相似文献   

17.
When variable selection with stepwise regression and model fitting are conducted on the same data set, competition for inclusion in the model induces a selection bias in coefficient estimators away from zero. In proportional hazards regression with right-censored data, selection bias inflates the absolute value of parameter estimate of selected parameters, while the omission of other variables may shrink coefficients toward zero. This paper explores the extent of the bias in parameter estimates from stepwise proportional hazards regression and proposes a bootstrap method, similar to those proposed by Miller (Subset Selection in Regression, 2nd edn. Chapman & Hall/CRC, 2002) for linear regression, to correct for selection bias. We also use bootstrap methods to estimate the standard error of the adjusted estimators. Simulation results show that substantial biases could be present in uncorrected stepwise estimators and, for binary covariates, could exceed 250% of the true parameter value. The simulations also show that the conditional mean of the proposed bootstrap bias-corrected parameter estimator, given that a variable is selected, is moved closer to the unconditional mean of the standard partial likelihood estimator in the chosen model, and to the population value of the parameter. We also explore the effect of the adjustment on estimates of log relative risk, given the values of the covariates in a selected model. The proposed method is illustrated with data sets in primary biliary cirrhosis and in multiple myeloma from the Eastern Cooperative Oncology Group.  相似文献   

18.
We propose bootstrap prediction intervals for an observation h periods into the future and its conditional mean. We assume that these forecasts are made using a set of factors extracted from a large panel of variables. Because we treat these factors as latent, our forecasts depend both on estimated factors and estimated regression coefficients. Under regularity conditions, asymptotic intervals have been shown to be valid under Gaussianity of the innovations. The bootstrap allows us to relax this assumption and to construct valid prediction intervals under more general conditions. Moreover, even under Gaussianity, the bootstrap leads to more accurate intervals in cases where the cross-sectional dimension is relatively small as it reduces the bias of the ordinary least-squares (OLS) estimator.  相似文献   

19.
ABSTRACT

This article considers nonparametric regression problems and develops a model-averaging procedure for smoothing spline regression problems. Unlike most smoothing parameter selection studies determining an optimum smoothing parameter, our focus here is on the prediction accuracy for the true conditional mean of Y given a predictor X. Our method consists of two steps. The first step is to construct a class of smoothing spline regression models based on nonparametric bootstrap samples, each with an appropriate smoothing parameter. The second step is to average bootstrap smoothing spline estimates of different smoothness to form a final improved estimate. To minimize the prediction error, we estimate the model weights using a delete-one-out cross-validation procedure. A simulation study has been performed by using a program written in R. The simulation study provides a comparison of the most well known cross-validation (CV), generalized cross-validation (GCV), and the proposed method. This new method is straightforward to implement, and gives reliable performances in simulations.  相似文献   

20.
A generalised regression estimation procedure is proposed that can lead to much improved estimation of population characteristics, such as quantiles, variances and coefficients of variation. The method involves conditioning on the discrepancy between an estimate of an auxiliary parameter and its known population value. The key distributional assumption is joint asymptotic normality of the estimates of the target and auxiliary parameters. This assumption implies that the relationship between the estimated target and the estimated auxiliary parameters is approximately linear with coefficients determined by their asymptotic covariance matrix. The main contribution of this paper is the use of the bootstrap to estimate these coefficients, which avoids the need for parametric distributional assumptions. First‐order correct conditional confidence intervals based on asymptotic normality can be improved upon using quantiles of a conditional double bootstrap approximation to the distribution of the studentised target parameter estimate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号