首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
对于半连续两部回归模型,考虑到每个回归部分都会遇到大量的候选变量,此时就会产生变量选择问题。文章主要研究Bernoulli-Normal两部回归模型的变量选择问题。先提出一种基于Lasso惩罚函数的变量选择方法,但考虑到Lasso估计量不具有Oracle性质,又提出一种基于自适应Lasso惩罚函数的变量选择方法。模拟结果表明:两种方法都能够对Bernoulli-Normal回归模型进行变量选择,且自适应Lasso方法的变量选择性能往往优于Lasso方法。  相似文献   

2.
王小燕等 《统计研究》2014,31(9):107-112
变量选择是统计建模的重要环节,选择合适的变量可以建立结构简单、预测精准的稳健模型。本文在logistic回归下提出了新的双层变量选择惩罚方法——adaptive Sparse Group Lasso(adSGL),其独特之处在于基于变量的分组结构作筛选,实现了组内和组间双层选择。该方法的优点是对各单个系数和组系数采取不同程度的惩罚,避免了过度惩罚大系数,从而提高了模型的估计和预测精度。求解的难点是惩罚似然函数不是严格凸的,因此本文基于组坐标下降法求解模型,并建立了调整参数的选取准则。模拟分析表明,对比现有代表性方法Sparse Group Lasso、Group Lasso及Lasso,adSGL法不仅提高了双层选择精度,而且降低了模型误差。最后本文将adSGL法应用到信用卡信用评分研究,对比logistic回归,它具有更高的分类精度和稳健性。  相似文献   

3.
江苏省全社会用电量预测   总被引:2,自引:0,他引:2  
本文建立了江苏省全社会电力需求量的主成分回归模型,得到主成分后,根据数据自身的特点,分别选择了二次多项式、三次多项式和生长模型,将主成分与全社会用电量两个变量进行回归,得到预测模型,然后根据江苏省十一五规划的要求,对江苏省2006-2010年的全社会电力需求量进行了预测。  相似文献   

4.
针对混合效应模型中固定效应与随机效应同时选择问题,提出了施加多个惩罚项的回归过程,同时给出了参数估计的交替迭代算法,并证明了算法的收敛性。针对两种特殊的多惩罚回归过程,分别利用计算机模拟数据进行了比较分析,结果显示新方法在各种不同条件下均有良好的表现,尤其是能处理高维稀疏的混合效应模型。最后通过一个实际数据演示了新方法的应用。  相似文献   

5.
文章利用主成分分析回归模型将经济社会发展状况的复杂指标进行主成分分析,进而得到保费收入与主成分变量的多元回归模型,得到影响现阶段保费收入的主要经济社会影响因素并进行定量分析。  相似文献   

6.
主成分分析是经典的无监督的数据处理工具,近年来关于稀疏主成分和有监督的主成分研究受到较多关注。基于正交迭代和距离相关系数,提出一种有监督的稀疏主成分分析方法 SSPCA,该方法考虑了自变量与因变量之间的相关性,并在迭代求解的过程中将一些与因变量Y相关性很弱的自变量对应的系数变为0,使所求的特征向量只保留预测能力较强的自变量信息;在数值模拟与实例分析中,相比其他四种方法,SSPCA方法均能取得较好效果。  相似文献   

7.
Logistic模型多重共线性问题的诊断及改进   总被引:1,自引:0,他引:1  
文章诊断并改进了logistic回归模型多重共线性问题方法,采用条件指数和方差分解比例两项指标进行共线性诊断、应用主成分改进和偏最小二乘回归两种方法进行多重共线性变量的改进处理:去除了回归模型中变量间的多重共线性影响,建立了较为理想的关系模型.结果表明,在Logisdc回归模型分析中,应用上述方法进行多重共线性的诊断和处理是有效及可行的.  相似文献   

8.
方匡南  杨阳 《统计研究》2018,35(8):104-115
针对分类问题,本文提出了稀疏组Lasso支持向量机方法(Sparse group lasso SVM, SGL-SVM),即在SVM模型的损失函数中引入SGL惩罚函数,能同时进行组间变量和组内变量的筛选。由于SGL-SVM的目标函数求解比较复杂,本文又提出了一种快速的双层坐标下降算法。通过模拟实验,发现SGL-SVM方法在预测效果和变量选择上均要好于其他方法,对于变量具有自然分组结构且组内是稀疏的数据,本文方法在提高变量选择效果的同时又能提高模型的预测精度。最后,将本文提出的SGL-SVM方法应用到我国制造业上市公司财务困境预测中。  相似文献   

9.
本文采用主成分分析方法确定模型变量,建立多元判别分析(MDA)、Logistic回归和改进型BP神经网络模型进行财务困境预测。结果表明,神经网络模型的预测准确率明显优于多元判别分析和Logistic回归模型,而后两者的判别效果接近,神经网络模型更适合于财务困境预测。但三种模型的长期预警能力不够理想,提出建立以定量模型为主、定性分析为辅的上市公司财务困境预测新方法。  相似文献   

10.
田茂再  梅波 《统计研究》2019,36(8):114-128
本文考虑函数型数据的结构特征,针对两类函数型变量分位回归模型(函数型因变量对标量自变量和函数型因变量对函数型自变量),基于函数型倾斜分位曲线的定义构建新型函数型倾斜分位回归模型。对于第二类模型,本文分别考虑样条基函数对模型系数展开和函数型主成分基函数对函数型自变量展开,得到倾斜分位回归模型的基本形式。参数估计采用成分梯度Boosting算法最小化加权非对称损失函数,提高计算效率。在理论上证明了倾斜分位回归模型的系数估计量均服从渐近正态分布。模拟和实证研究结果显示,倾斜分位回归模型比已有的逐点分位回归模型具有更好的拟合效果。根据积分均方预测误差准则,本文提出的模型有一致较好的预测能力。  相似文献   

11.
Technical advances in many areas have produced more complicated high‐dimensional data sets than the usual high‐dimensional data matrix, such as the fMRI data collected in a period for independent trials, or expression levels of genes measured in different tissues. Multiple measurements exist for each variable in each sample unit of these data. Regarding the multiple measurements as an element in a Hilbert space, we propose Principal Component Analysis (PCA) in Hilbert space. The principal components (PCs) thus defined carry information about not only the patterns of variations in individual variables but also the relationships between variables. To extract the features with greatest contributions to the explained variations in PCs for high‐dimensional data, we also propose sparse PCA in Hilbert space by imposing a generalized elastic‐net constraint. Efficient algorithms to solve the optimization problems in our methods are provided. We also propose a criterion for selecting the tuning parameter.  相似文献   

12.
Using networks as prior knowledge to guide model selection is a way to reach structured sparsity. In particular, the fused lasso that was originally designed to penalize differences of coefficients corresponding to successive features has been generalized to handle features whose effects are structured according to a given network. As any prior information, the network provided in the penalty may contain misleading edges that connect coefficients whose difference is not zero, and the extent to which the performance of the method depend on the suitability of the graph has never been clearly assessed. In this work we investigate the theoretical and empirical properties of the adaptive generalized fused lasso in the context of generalized linear models. In the fixed \(p\) setting, we show that, asymptotically, adding misleading edges in the graph does not prevent the adaptive generalized fused lasso from enjoying asymptotic oracle properties, while forgetting suitable edges can be more problematic. These theoretical results are complemented by an extensive simulation study that assesses the robustness of the adaptive generalized fused lasso against misspecification of the network as well as its applicability when theoretical coefficients are not exactly equal. Our contribution is also to evaluate the applicability of the generalized fused lasso for the joint modeling of multiple sparse regression functions. Illustrations are provided on two real data examples.  相似文献   

13.
The fused lasso penalizes a loss function by the L1 norm for both the regression coefficients and their successive differences to encourage sparsity of both. In this paper, we propose a Bayesian generalized fused lasso modeling based on a normal-exponential-gamma (NEG) prior distribution. The NEG prior is assumed into the difference of successive regression coefficients. The proposed method enables us to construct a more versatile sparse model than the ordinary fused lasso using a flexible regularization term. Simulation studies and real data analyses show that the proposed method has superior performance to the ordinary fused lasso.  相似文献   

14.
Huang J  Ma S  Li H  Zhang CH 《Annals of statistics》2011,39(4):2021-2046
We propose a new penalized method for variable selection and estimation that explicitly incorporates the correlation patterns among predictors. This method is based on a combination of the minimax concave penalty and Laplacian quadratic associated with a graph as the penalty function. We call it the sparse Laplacian shrinkage (SLS) method. The SLS uses the minimax concave penalty for encouraging sparsity and Laplacian quadratic penalty for promoting smoothness among coefficients associated with the correlated predictors. The SLS has a generalized grouping property with respect to the graph represented by the Laplacian quadratic. We show that the SLS possesses an oracle property in the sense that it is selection consistent and equal to the oracle Laplacian shrinkage estimator with high probability. This result holds in sparse, high-dimensional settings with p ? n under reasonable conditions. We derive a coordinate descent algorithm for computing the SLS estimates. Simulation studies are conducted to evaluate the performance of the SLS method and a real data example is used to illustrate its application.  相似文献   

15.
Summary.  The lasso penalizes a least squares regression by the sum of the absolute values ( L 1-norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients equal to 0). We propose the 'fused lasso', a generalization that is designed for problems with features that can be ordered in some meaningful way. The fused lasso penalizes the L 1-norm of both the coefficients and their successive differences. Thus it encourages sparsity of the coefficients and also sparsity of their differences—i.e. local constancy of the coefficient profile. The fused lasso is especially useful when the number of features p is much greater than N , the sample size. The technique is also extended to the 'hinge' loss function that underlies the support vector classifier. We illustrate the methods on examples from protein mass spectroscopy and gene expression data.  相似文献   

16.
In a calibration of near-infrared (NIR) instrument, we regress some chemical compositions of interest as a function of their NIR spectra. In this process, we have two immediate challenges: first, the number of variables exceeds the number of observations and, second, the multicollinearity between variables are extremely high. To deal with the challenges, prediction models that produce sparse solutions have recently been proposed. The term ‘sparse’ means that some model parameters are zero estimated and the other parameters are estimated naturally away from zero. In effect, a variable selection is embedded in the model to potentially achieve a better prediction. Many studies have investigated sparse solutions for latent variable models, such as partial least squares and principal component regression, and for direct regression models such as ridge regression (RR). However, in the latter, it mainly involves an L1 norm penalty to the objective function such as lasso regression. In this study, we investigate new sparse alternative models for RR within a random effects model framework, where we consider Cauchy and mixture-of-normals distributions on the random effects. The results indicate that the mixture-of-normals model produces a sparse solution with good prediction and better interpretation. We illustrate the methods using NIR spectra datasets from milk and corn specimens.  相似文献   

17.
Summary.  We propose covariance-regularized regression, a family of methods for prediction in high dimensional settings that uses a shrunken estimate of the inverse covariance matrix of the features to achieve superior prediction. An estimate of the inverse covariance matrix is obtained by maximizing the log-likelihood of the data, under a multivariate normal model, subject to a penalty; it is then used to estimate coefficients for the regression of the response onto the features. We show that ridge regression, the lasso and the elastic net are special cases of covariance-regularized regression, and we demonstrate that certain previously unexplored forms of covariance-regularized regression can outperform existing methods in a range of situations. The covariance-regularized regression framework is extended to generalized linear models and linear discriminant analysis, and is used to analyse gene expression data sets with multiple class and survival outcomes.  相似文献   

18.
Calibration techniques in survey sampling, such as generalized regression estimation (GREG), were formalized in the 1990s to produce efficient estimators of linear combinations of study variables, such as totals or means. They implicitly lie on the assumption of a linear regression model between the variable of interest and some auxiliary variables in order to yield estimates with lower variance if the model is true and remaining approximately design-unbiased even if the model does not hold. We propose a new class of model-assisted estimators obtained by releasing a few calibration constraints and replacing them with a penalty term. This penalization is added to the distance criterion to minimize. By introducing the concept of penalized calibration, combining usual calibration and this ‘relaxed’ calibration, we are able to adjust the weight given to the available auxiliary information. We obtain a more flexible estimation procedure giving better estimates particularly when the auxiliary information is overly abundant or not fully appropriate to be completely used. Such an approach can also be seen as a design-based alternative to the estimation procedures based on the more general class of mixed models, presenting new prospects in some scopes of application such as inference on small domains.  相似文献   

19.
Sparsity-inducing penalties are useful tools for variable selection and are also effective for regression problems where the data are functions. We consider the problem of selecting not only variables but also decision boundaries in multiclass logistic regression models for functional data, using sparse regularization. The parameters of the functional logistic regression model are estimated in the framework of the penalized likelihood method with the sparse group lasso-type penalty, and then tuning parameters for the model are selected using the model selection criterion. The effectiveness of the proposed method is investigated through simulation studies and the analysis of a gene expression data set.  相似文献   

20.
In this article, we present a compressive sensing based framework for generalized linear model regression that employs a two-component noise model and convex optimization techniques to simultaneously detect outliers and determine optimally sparse representations of noisy data from arbitrary sets of basis functions. We then extend our model to include model order reduction capabilities that can uncover inherent sparsity in regression coefficients and achieve simple, superior fits. Second, we use the mixed ?2/?1 norm to develop another model that can efficiently uncover block-sparsity in regression coefficients. By performing model order reduction over all independent variables and basis functions, our algorithms successfully deemphasize the effect of independent variables that become uncorrelated with dependent variables. This desirable property has various applications in real-time anomaly detection, such as faulty sensor detection and sensor jamming in wireless sensor networks. After developing our framework and inheriting a stable recovery theorem from compressive sensing theory, we present two simulation studies on sparse or block-sparse problems that demonstrate the superior performance of our algorithms with respect to (1) classic outlier-invariant regression techniques like least absolute value and iteratively reweighted least-squares and (2) classic sparse-regularized regression techniques like LASSO.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号