首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 922 毫秒
1.
Most methods for survival prediction from high-dimensional genomic data combine the Cox proportional hazards model with some technique of dimension reduction, such as partial least squares regression (PLS). Applying PLS to the Cox model is not entirely straightforward, and multiple approaches have been proposed. The method of Park et al. (Bioinformatics 18(Suppl. 1):S120–S127, 2002) uses a reformulation of the Cox likelihood to a Poisson type likelihood, thereby enabling estimation by iteratively reweighted partial least squares for generalized linear models. We propose a modification of the method of park et al. (2002) such that estimates of the baseline hazard and the gene effects are obtained in separate steps. The resulting method has several advantages over the method of park et al. (2002) and other existing Cox PLS approaches, as it allows for estimation of survival probabilities for new patients, enables a less memory-demanding estimation procedure, and allows for incorporation of lower-dimensional non-genomic variables like disease grade and tumor thickness. We also propose to combine our Cox PLS method with an initial gene selection step in which genes are ordered by their Cox score and only the highest-ranking k% of the genes are retained, obtaining a so-called supervised partial least squares regression method. In simulations, both the unsupervised and the supervised version outperform other Cox PLS methods.  相似文献   

2.
We compare the partial least squares (PLS) and the principal component analysis (PCA), in a general case in which the existence of a true linear regression is not assumed. We prove under mild conditions that PLS and PCA are equivalent, to within a first-order approximation, hence providing a theoretical explanation for empirical findings reported by other researchers. Next, we assume the existence of a true linear regression equation and obtain asymptotic formulas for the bias and variance of the PLS parameter estimator  相似文献   

3.
4.
5.
In many complex diseases such as cancer, a patient undergoes various disease stages before reaching a terminal state (say disease free or death). This fits a multistate model framework where a prognosis may be equivalent to predicting the state occupation at a future time t. With the advent of high-throughput genomic and proteomic assays, a clinician may intent to use such high-dimensional covariates in making better prediction of state occupation. In this article, we offer a practical solution to this problem by combining a useful technique, called pseudo-value (PV) regression, with a latent factor or a penalized regression method such as the partial least squares (PLS) or the least absolute shrinkage and selection operator (LASSO), or their variants. We explore the predictive performances of these combinations in various high-dimensional settings via extensive simulation studies. Overall, this strategy works fairly well provided the models are tuned properly. Overall, the PLS turns out to be slightly better than LASSO in most settings investigated by us, for the purpose of temporal prediction of future state occupation. We illustrate the utility of these PV-based high-dimensional regression methods using a lung cancer data set where we use the patients’ baseline gene expression values.  相似文献   

6.
7.
工业化和城市化加剧了经济发展对能源的依赖性,使经济增长受能源约束的影响日益突出。在对生产函数一般性设定的基础上,推导了能源约束对经济增长阻尼效应的测算公式;通过建立扩展的C-D生产函数和CES生产函数模型,运用偏最小二乘(PLS)回归方法对生产函数进行估计,实证检验重庆市1978—2011年能源约束对经济增长的阻尼效应。研究发现:在C-D和CES生产函数中,能源约束对经济增长的阻尼系数分别高达5.06%和4.53%,阻尼效应非常显著,证实了经济发展对能源消耗的强依赖性。为此,从产业结构调整、能源结构调整、技术创新和人力资本开发等方面,提出了促进经济发展转型、实现经济增长与节能减排协调发展的对策建议。  相似文献   

8.
Regularization is a well-known and used statistical approach covering individual points or limit approximations. In this study, the canonical correlation analysis (CCA) process of the paths is discussed with partial least squares (PLS) as the other boundary covering transformation to a symmetric eigenvalue (or singular value) problem dependent on a parameter. Two regularizations of the original criterion in the parameterization domain are compared, i.e. using projection and by identity matrix. We discuss the existence and uniqueness of the analytic path for eigenvalues and corresponding elements of eigenvectors. Specifically, canonical analysis is applied to an ill-conditioned case of singular within-sets input matrices encompassing tourism accommodation data.KEYWORDS: Multivariate analysis, canonical correlation analysis, optimization, analytic decomposition, paths of eigenvalues and eigenvectors, tourismMSC Classifications: 62H20, 46N10, 62P20  相似文献   

9.
This study provides an alternative approach that takes account of the unobserved effects of each seller under a sample selection framework while using online auction data. We use data collected from Yahoo! Kimo Auction (Taiwan) to demonstrate that earlier empirical results of online auction studies may be biased due to violating the assumption of independence of the error terms between sample observations. Empirical findings show that seller reputation is no longer as the most important factor for buyers to bid on items, while the sample data confirm the unobserved heterogeneity of sellers and sample selection problem.  相似文献   

10.
We define a parametric proportional odds frailty model to describe lifetime data incorporating heterogeneity between individuals. An unobserved individual random effect, called frailty, acts multiplicatively on the odds of failure by time t. We investigate fitting by maximum likelihood and by least squares. For the latter, the parametric survivor function is fitted to the nonparametric Kaplan–Meier estimate at the observed failure times. Bootstrap standard errors and confidence intervals are obtained for the least squares estimates. The models are applied successfully to simulated data and to two real data sets. Least squares estimates appear to have smaller bias than maximum likelihood.  相似文献   

11.
Partial least squares (PLS) is a class of methods for modeling relations between sets of observed variables by using the latent components where the predictors are highly collinear. SIMPLS is a commonly used PLS algorithm that calculates the latent components directly as linear combinations of the original variables. However, SIMPLS is known to be very sensible to outliers since it is based on the empirical cross-covariance matrix. RoPLS is a recently proposed iterative method for robust SIMPLS. In this article, the influence function for the RoPLS coefficient estimator is derived. It is demonstrated that under certain conditions, the RoPLS estimator has infinitesimal robustness.  相似文献   

12.
Abstract

We propose a simple procedure based on an existing “debiased” l1-regularized method for inference of the average partial effects (APEs) in approximately sparse probit and fractional probit models with panel data, where the number of time periods is fixed and small relative to the number of cross-sectional observations. Our method is computationally simple and does not suffer from the incidental parameters problems that come from attempting to estimate as a parameter the unobserved heterogeneity for each cross-sectional unit. Furthermore, it is robust to arbitrary serial dependence in underlying idiosyncratic errors. Our theoretical results illustrate that inference concerning APEs is more challenging than inference about fixed and low-dimensional parameters, as the former concerns deriving the asymptotic normality for sample averages of linear functions of a potentially large set of components in our estimator when a series approximation for the conditional mean of the unobserved heterogeneity is considered. Insights on the applicability and implications of other existing Lasso-based inference procedures for our problem are provided. We apply the debiasing method to estimate the effects of spending on test pass rates. Our results show that spending has a positive and statistically significant average partial effect; moreover, the effect is comparable to found using standard parametric methods.  相似文献   

13.
The article develops a semiparametric estimation method for the bivariate count data regression model. We develop a series expansion approach in which dependence between count variables is introduced by means of stochastically related unobserved heterogeneity components, and in which, unlike existing commonly used models, positive as well as negative correlations are allowed. Extensions that accommodate excess zeros, censored data, and multivariate generalizations are also given. Monte Carlo experiments and an empirical application to tobacco use confirms that the model performs well relative to existing bivariate models, in terms of various statistical criteria and in capturing the range of correlation among dependent variables. This article has supplementary materials online.  相似文献   

14.
Abstract

In this article, we propose a new regression method called general composite quantile regression (GCQR) which releases the unrealistic finite error variance assumption being imposed by the traditional least squares (LS) method. Unlike the recently proposed composite quantile regression (CQR) method, our proposed GCQR allows any continuous non-uniform density/weight function. As a result, determination of the number of uniform quantile positions is not required. Most importantly, the proposed GCQR criterion can be readily transformed to a linear programing problem, which substantially reduces the computing time. Our theoretical and empirical results show that the GCQR is generally efficient than the CQR and LS if the weight function is appropriately chosen. The oracle properties of the penalized GCQR are also provided. Our simulation results are consistent with the derived theoretical findings. A real data example is analyzed to demonstrate our methodologies.  相似文献   

15.

Outlier detection is an inevitable step to most statistical data analyses. However, the mere detection of an outlying case does not always answer all scientific questions associated with that data point. Outlier detection techniques, classical and robust alike, will typically flag the entire case as outlying, or attribute a specific case weight to the entire case. In practice, particularly in high dimensional data, the outlier will most likely not be outlying along all of its variables, but just along a subset of them. If so, the scientific question why the case has been flagged as an outlier becomes of interest. In this article, a fast and efficient method is proposed to detect variables that contribute most to an outlier’s outlyingness. Thereby, it helps the analyst understand in which way an outlier lies out. The approach pursued in this work is to estimate the univariate direction of maximal outlyingness. It is shown that the problem of estimating that direction can be rewritten as the normed solution of a classical least squares regression problem. Identifying the subset of variables contributing most to outlyingness, can thus be achieved by estimating the associated least squares problem in a sparse manner. From a practical perspective, sparse partial least squares (SPLS) regression, preferably by the fast sparse NIPALS (SNIPLS) algorithm, is suggested to tackle that problem. The performed method is demonstrated to perform well both on simulated data and real life examples.

  相似文献   

16.
In this article, we estimate structural labor supply with piecewise-linear budgets and nonseparable endogenous unobserved heterogeneity. We propose a two-stage method to address the endogeneity issue that comes from the correlation between the covariates and unobserved heterogeneity. In the first stage, Evdokimov’s nonparametric de-convolution method serves to identify the conditional distribution of unobserved heterogeneity from the quasi-reduced model that uses panel data. In the second stage, the conditional distribution is plugged into the original structural model to estimate labor supply. We apply this methodology to estimate the labor supply of U.S. married men in 2004 and 2005. Our empirical work demonstrates that ignoring the correlation between the covariates and unobserved heterogeneity will bias the estimates of wage elasticities upward. The labor elasticity estimated from a fixed effects model is less than half of that obtained from a random effects model.  相似文献   

17.
This paper reviews various treatments of non-metric variables in partial least squares (PLS) and principal component analysis (PCA) algorithms. The performance of different treatments is compared in an extensive simulation study under several typical data generating processes and associated recommendations are made. Moreover, we find that PLS-based methods are to prefer in practice, since, independent of the data generating process, PLS performs either as good as PCA or significantly outperforms it. As an application of PLS and PCA algorithms with non-metric variables we consider construction of a wealth index to predict household expenditures. Consistent with our simulation study, we find that a PLS-based wealth index with dummy coding outperforms PCA-based ones.  相似文献   

18.
朱慧明等 《统计研究》2014,31(7):97-104
针对不可观测异质性非时变假设导致的删失变量偏差及推断无效问题,构建贝叶斯隐马尔科夫异质面板模型,刻画截面个体间的动态时变不可观测异质性,诊断经济系统环境中可能存在的隐性变点,设计相应的马尔科夫链蒙特卡洛抽样算法估计模型参数,并对中国各地区的金融发展与城乡收入差距关系进行实证分析,捕捉到金融发展与城乡收入差距间长期稳定关系的隐性变化,发现了区域个体不可观测异质性存在的动态时变特征。研究结果表明各参数的迭代轨迹收敛且估计误差非常小,验证了贝叶斯隐马尔科夫异质面板模型的有效性。  相似文献   

19.
In this paper we discuss the partial least squares (PLS) prediction method. The method is compared to the predictor based on principal component regression (PCR). Both theoretical considerations and computations on artificial and real data are presented.  相似文献   

20.
This article addresses the problem of the bias of income and expenditure elasticities estimated on pseudopanel data caused by measurement error and unobserved heterogeneity. We gauge these biases empirically by comparing cross-sectional, pseudo-panel, and true panel data from both Polish and U.S. expenditure surveys. Our results suggest that unobserved heterogeneity imparts a downward bias to cross-section estimates of income elasticities of at-home food expenditures and an upward bias to estimates of income elasticities of away-from-home food expenditures. “Within” and first-difference estimators suffer less bias, but only if the effects of measurement error are accounted for with instrumental variables.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号