首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
随着计算机的飞速发展,极大地便利了数据的获取和存储,很多企业积累了大量的数据,同时数据的维度也越来越高,噪声变量越来越多,因此在建模分析时面临的重要问题之一就是从高维的变量中筛选出少数的重要变量。针对因变量取值为(0,1)区间的比例数据提出了正则化Beta回归,研究了在LASSO、SCAD和MCP三种惩罚方法下的极大似然估计及其渐进性质。统计模拟表明MCP的方法会优于SCAD和LASSO,并且随着样本量的增大,SCAD的方法也将优于LASSO。最后,将该方法应用到中国上市公司股息率的影响因素研究中。  相似文献   

2.
文章基于贝叶斯学习,将正则化方法从贝叶斯分析的角度展开,在响应变量服从正态分布、回归系数服从指数型先验分布族的条件下,用贝叶斯准则给出了惩罚因子的取值与响应变量、系数的方差之间的关系,并将这一结果应用到岭回归和lasso回归中惩罚因子的选择.实例检验结果表明,当响应变量和系数服从正态分布,惩罚因子的值取二者方差商的方法比岭迹法和广义交叉验证法(GCV)拟合效果更优.  相似文献   

3.
变量选择是处理高维统计模型的基本方法,在回归模型的变量选择中SCAD惩罚函数不仅可以很好地选择出正确模型,同时还可以对参数进行估计,而且还具有oracle性质,但这些良好的性质是基于选择出一个合适的调节参数。目前国内关于调节参数选择方面大多是对于变量选择问题的研究,针对广义线性模型基于SCAD惩罚使用新方法 ERIC准则进行调节参数的选择,并证明在一定条件下经过该准则选择的模型具有一致性。模拟与实证分析结果表明,ERIC方法在选择调节参数方面优于传统的CV准则、AIC准则和BIC准则。  相似文献   

4.
针对高维混合效应模型,本文提出了一种双正则化分位回归方法.通过对随机和固定效应系数同时实施L1正则化惩罚,一方面能够对重要解释变量进行挑选,另一方面能够消除个体随机波动带来的偏差.求解参数估计的交替迭代算法不仅破解了要同时确定两个调整参数的难题,而且算法速度快.模拟结果也表明该方法不仅对误差类型有很强的抗干扰能力,同时在模型有不同稀疏程度时均表现良好,尤其是对于解释变量多于样本的高维情况.为了方便在实际问题中选择最优正则化参数,本文还对两种参数选取标准进行了比较研究.最后利用新方法对一个教育方面的数据进行了实证演示,找出了在各个分位点处对学生成绩有影响的重要因素.  相似文献   

5.
在回归问题中,惩罚特征即正则化是特征处理的常用方式。但在集成分类问题中,惩罚特征以改善训练效果的研究较少。文章提出一种基于GBDT模型训练的SHAP值对各样本特征惩罚加权,进而提升分类精度的集成模型;其中,对于测试样本的SHAP值估计,通过其与训练集的样本距离权重结合训练集的SHAP矩阵近似得到。实验结果表明:选择GBDT_SHAP值惩罚特征后,模型的预测精度均有显著提升,验证了该算法的有效性。以GBDT_SHAP_GBDT模型为例,其在多组经典数据集上的分类效果良好,且在不平衡数据集上性能突出;若干组仿真实验表明,该方法能使模型快速达到较优且较为稳定的拟合效果,鲁棒性较强。  相似文献   

6.
基于AUC回归的不平衡数据特征选择模型研究   总被引:1,自引:0,他引:1  
针对不平衡数据的泛化预测和特征选择问题,提出了一种引入MCP惩罚函数的AUC回归模型(MCP-AUCR)。该模型采用考虑所有阈值信息的优化目标函数,具有处理不平衡数据的能力,并具有较好的特征选择效果;在讨论该模型定义与原理的基础上,提出相应的循环坐标下降训练算法,并通过数值模拟研究验证其优良性质;针对中国股票市场机械、设备、仪表板块中的上市公司,构建了基于MCP-AUCR的财务预警模型。研究结果显示:该财务预警模型可以选择出可解释的重要财务指标并进行有效预测,显著优于传统模型。  相似文献   

7.
关欣  王征 《统计与决策》2016,(17):179-181
国内对Logistic回归模型和BP神经网络模型在财务预警方面已有不少实证研究,这些研究大多从预测准确度较高的角度出发,认为两个模型可以借鉴使用,但没有具体讨论模型犯第一类错误(将财务危机误判为财务正常)和第二类错误(将财务正常误判为财务危机)的概率.文章结合Logistic回归模型及BP神经网络模型的原理,选取上市公司财务数据进行实证,研究结果表明BP神经网络模型总体预测准确性较高,犯第一类错误的概率较低,对财务预警分析有一定借鉴作用;Logistic回归模型预测准确度低于BP神经网络模型,且犯第一类错误的概率远高于BP神经网络模型,因此运用该模型进行财务预警时应十分谨慎.  相似文献   

8.
负二项回归模型在过离散型索赔次数中的应用研究   总被引:2,自引:0,他引:2  
徐飞 《统计教育》2009,(4):53-55
索赔次数预测模型中通常考虑泊松回归模型,但当索赔次数中出现过离散问题时,泊松回归模型就不再适合。本文讨论了两种分布形式的负二项回归模型,并利用它们对一组车险数据进行了拟合,效果得到了明显改善。  相似文献   

9.
在超高维数据中,一方面,协变量的维数可能远远大于样本量,甚至随着样本量以指数级的速度增长;另一方面,超高维数据通常是异质的,协变量对条件分布中心的影响可能与他们对尾部的影响大不相同,甚至会出现重尾以及异常点的复杂情况。文章在协变量维度发散且为超高维的情况下研究了部分线性可加分位数回归模型的变量选择和稳健估计问题。首先,为了实现模型的稀疏性和非参数光滑性,引入了一种非凸Atan双惩罚,并采用分位迭代坐标下降算法来解决所提方法的优化问题。在选择适当正则化参数的情况下,证明了所提双惩罚估计量的理论性质。其次,通过模拟研究对所提方法的性能进行验证。模拟结果表明,所提方法比其他惩罚方法具有更好的表现,尤其是在数据存在重尾的情况下。最后,通过基于癌症筛查病人血液样本数据的实证来验证所提方法的实用性。  相似文献   

10.
基于灰色系统理论的两种房价预测方法比较   总被引:1,自引:0,他引:1  
文章以灰色系统理论作为理论基础,分别构建了GM(1,1)模型和融入灰色理论的一元线性回归模型对房价进行预测。通过对上海浦东新区房市做实证分析发现:GM(1,1)模型的拟合程度和预测精度均优于灰色一元线性回归模型,并且GM(1,1)模型更加适应样本数据较少的情况。  相似文献   

11.
In this article, we develop a generalized penalized linear unbiased selection (GPLUS) algorithm. The GPLUS is designed to compute the paths of penalized logistic regression based on the smoothly clipped absolute deviation (SCAD) and the minimax concave penalties (MCP). The main idea of the GPLUS is to compute possibly multiple local minimizers at individual penalty levels by continuously tracing the minimizers at different penalty levels. We demonstrate the feasibility of the proposed algorithm in logistic and linear regression. The simulation results favor the SCAD and MCP’s selection accuracy encompassing a suitable range of penalty levels.  相似文献   

12.
This paper considers a problem of variable selection in quantile regression with autoregressive errors. Recently, Wu and Liu (2009) investigated the oracle properties of the SCAD and adaptive-LASSO penalized quantile regressions under non identical but independent error assumption. We further relax the error assumptions so that the regression model can hold autoregressive errors, and then investigate theoretical properties for our proposed penalized quantile estimators under the relaxed assumption. Optimizing the objective function is often challenging because both quantile loss and penalty functions may be non-differentiable and/or non-concave. We adopt the concept of pseudo data by Oh et al. (2007) to implement a practical algorithm for the quantile estimate. In addition, we discuss the convergence property of the proposed algorithm. The performance of the proposed method is compared with those of the majorization-minimization algorithm (Hunter and Li, 2005) and the difference convex algorithm (Wu and Liu, 2009) through numerical and real examples.  相似文献   

13.
高维稀疏数据的特征选择是互联网舆情文本聚类分析的关键。借鉴罚模型思想,利用罚多项混合模型,给不显著影响聚类结果的特征予较重惩罚的方式实现特征选择,可有效选出代表舆情各类观点的典型词汇,实证应用中有较为理想的表现。  相似文献   

14.
Recent studies have demonstrated theoretical attractiveness of a class of concave penalties in variable selection, including the smoothly clipped absolute deviation and minimax concave penalties. The computation of the concave penalized solutions in high-dimensional models, however, is a difficult task. We propose a majorization minimization by coordinate descent (MMCD) algorithm for computing the concave penalized solutions in generalized linear models. In contrast to the existing algorithms that use local quadratic or local linear approximation to the penalty function, the MMCD seeks to majorize the negative log-likelihood by a quadratic loss, but does not use any approximation to the penalty. This strategy makes it possible to avoid the computation of a scaling factor in each update of the solutions, which improves the efficiency of coordinate descent. Under certain regularity conditions, we establish theoretical convergence property of the MMCD. We implement this algorithm for a penalized logistic regression model using the SCAD and MCP penalties. Simulation studies and a data example demonstrate that the MMCD works sufficiently fast for the penalized logistic regression in high-dimensional settings where the number of covariates is much larger than the sample size.  相似文献   

15.
A number of nonstationary models have been developed to estimate extreme events as function of covariates. A quantile regression (QR) model is a statistical approach intended to estimate and conduct inference about the conditional quantile functions. In this article, we focus on the simultaneous variable selection and parameter estimation through penalized quantile regression. We conducted a comparison of regularized Quantile Regression model with B-Splines in Bayesian framework. Regularization is based on penalty and aims to favor parsimonious model, especially in the case of large dimension space. The prior distributions related to the penalties are detailed. Five penalties (Lasso, Ridge, SCAD0, SCAD1 and SCAD2) are considered with their equivalent expressions in Bayesian framework. The regularized quantile estimates are then compared to the maximum likelihood estimates with respect to the sample size. A Markov Chain Monte Carlo (MCMC) algorithms are developed for each hierarchical model to simulate the conditional posterior distribution of the quantiles. Results indicate that the SCAD0 and Lasso have the best performance for quantile estimation according to Relative Mean Biais (RMB) and the Relative Mean-Error (RME) criteria, especially in the case of heavy distributed errors. A case study of the annual maximum precipitation at Charlo, Eastern Canada, with the Pacific North Atlantic climate index as covariate is presented.  相似文献   

16.
The high-dimensional data arises in diverse fields of sciences, engineering and humanities. Variable selection plays an important role in dealing with high dimensional statistical modelling. In this article, we study the variable selection of quadratic approximation via the smoothly clipped absolute deviation (SCAD) penalty with a diverging number of parameters. We provide a unified method to select variables and estimate parameters for various of high dimensional models. Under appropriate conditions and with a proper regularization parameter, we show that the estimator has consistency and sparsity, and the estimators of nonzero coefficients enjoy the asymptotic normality as they would have if the zero coefficients were known in advance. In addition, under some mild conditions, we can obtain the global solution of the penalized objective function with the SCAD penalty. Numerical studies and a real data analysis are carried out to confirm the performance of the proposed method.  相似文献   

17.
This article proposes a variable selection procedure for partially linear models with right-censored data via penalized least squares. We apply the SCAD penalty to select significant variables and estimate unknown parameters simultaneously. The sampling properties for the proposed procedure are investigated. The rate of convergence and the asymptotic normality of the proposed estimators are established. Furthermore, the SCAD-penalized estimators of the nonzero coefficients are shown to have the asymptotic oracle property. In addition, an iterative algorithm is proposed to find the solution of the penalized least squares. Simulation studies are conducted to examine the finite sample performance of the proposed method.  相似文献   

18.
Abstract.  In finite mixtures of location–scale distributions, if there is no constraint or penalty on the parameters, then the maximum likelihood estimator does not exist because the likelihood is unbounded. To avoid this problem, we consider a penalized likelihood, where the penalty is a function of the minimum of the ratios of the scale parameters and the sample size. It is shown that the penalized maximum likelihood estimator is strongly consistent. We also analyse the consistency of a penalized maximum likelihood estimator where the penalty is imposed on the scale parameters themselves.  相似文献   

19.
In this paper, we discuss the selection of random effects within the framework of generalized linear mixed models (GLMMs). Based on a reparametrization of the covariance matrix of random effects in terms of modified Cholesky decomposition, we propose to add a shrinkage penalty term to the penalized quasi-likelihood (PQL) function of the variance components for selecting effective random effects. The shrinkage penalty term is taken as a function of the variance of random effects, initiated by the fact that if the variance is zero then the corresponding variable is no longer random (with probability one). The proposed method takes the advantage of a convenient computation for the PQL estimation and appealing properties for certain shrinkage penalty functions such as LASSO and SCAD. We propose to use a backfitting algorithm to estimate the fixed effects and variance components in GLMMs, which also selects effective random effects simultaneously. Simulation studies show that the proposed approach performs quite well in selecting effective random effects in GLMMs. Real data analysis is made using the proposed approach, too.  相似文献   

20.
Wu Y  Li L 《Statistica Sinica》2011,2011(21):707-730
We investigate asymptotic properties of a family of sufficient dimension reduction estimators when the number of predictors p diverges to infinity with the sample size. We adopt a general formulation of dimension reduction estimation through least squares regression of a set of transformations of the response. This formulation allows us to establish the consistency of reduction projection estimation. We then introduce the SCAD max penalty, along with a difference convex optimization algorithm, to achieve variable selection. We show that the penalized estimator selects all truly relevant predictors and excludes all irrelevant ones with probability approaching one, meanwhile it maintains consistent reduction basis estimation for relevant predictors. Our work differs from most model-based selection methods in that it does not require a traditional model, and it extends existing sufficient dimension reduction and model-free variable selection approaches from the fixed p scenario to a diverging p.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号