首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
随机森林算法在城市空气质量预测中的应用   总被引:1,自引:0,他引:1  
近年来雾霾现象不断出现,空气质量状况愈发受到关注.文章以每日前一天的PM2.5、PM10浓度值等污染指标及温度、湿度、风速值等气象指标为影响因子,尝试基于随机森林算法的分类与回归功能,采用交叉验证法构建空气质量预测模型,并与应用Boosting、Bagging、决策树及支持向量机算法建立的模型的预测结果对比,发现随机森林模型具有较高的预测精度、较强的泛化能力及较好的稳健性能等优点,对开展城市空气质量预测工作有一定的指导意义.  相似文献   

2.
刘展等 《统计研究》2021,38(11):130-140
随着大数据与互联网技术的迅猛发展,网络调查的应用越来越广泛。本文提出网络调查样本的随机森林倾向得分模型推断方法,通过构建若干棵分类决策树组成随机森林,对网络调查样本单元的倾向得分进行估计,从而实现对总体的推断。模拟分析和实证研究结果表明:基于随机森林倾向得分模型的总体均值估计的相对偏差、方差与均方误差均比基于Logistic倾向得分模型的总体均值估计的相对偏差、方差与均方误差小,提出的方法估计效果更好。  相似文献   

3.
对随机森林做修剪的目的就是要把随机森林中重要的分类器找到,使得修剪后得到的子森林不仅具有可解释性,而且能充分利用数据的信息量.文章提出一种新的修剪随机森林方法,基于样本的边缘函数,采用逐步向后算法,得到嵌套子森林,采用1-se法则挑选最优子森林.在两份实际数据中和已有随机森林的修剪方法做了对比,结果表明,所提出的方法,在修剪后子森林预测率的分布和子森林中分类器个数的分布,以及选出的解释变量三个指标上均具有优势.  相似文献   

4.
为解决传统非参数众数回归模型没有考虑解释变量间复杂交互影响的局限,文章将众数回归与机器学习方法相结合,提出了一个新的非参数众数回归模型:众数回归森林模型。该模型一方面充分考虑了各个解释变量之间的交互影响;另一方面采用Bagging技术汇总多个众数回归树的结果,提高了预测性能。数值模拟结果表明:第一,与线性众数回归模型和众数回归树模型相比,众数回归森林模型极大地提高了估计和预测精度;第二,当数据为偏态分布时,众数回归森林模型的估计和预测精度显著优于中位数回归森林和均值回归森林模型。此外,将众数回归森林模型应用于收入分配研究中,得到了与中位数回归森林和均值回归森林模型不同的结果。  相似文献   

5.
分别以筛选的4种技术指标和6个宏观经济指标作为国债期货指数预测变量,利用随机森林算法构建4种机器学习预测模型;依据价格波动集聚性设计跟踪交易规则,通过比较4种模型的预测精度和跟踪交易收益率,检验宏观经济指标、技术指标和随机森林算法对国债期货指数的预测能力。研究结果发现:用主成分精选技术指标构建的预测模型,对国债期货指数的跟踪交易收益率虽然明显优于市场收益率,但不如遵循单个技术指标经验交易规则的跟踪交易收益率;用主成分精选技术指标和宏观经济指标构建的模型能够取得很好的预测精度和跟踪交易收益率,这表明宏观经济指标与技术指标都对国债期货价格具有预测意义,可以利用随机森林机器学习算法构建有效的国债期货量化投资模型。  相似文献   

6.
基于递归特征消除方法的随机森林算法   总被引:1,自引:0,他引:1  
基于随机森林算法中的相关预测因子进行变量选择,在高维回归或分类框架中,变量选择是一项艰巨的任务,甚至在高度相关的预测中变得更加具有挑战性,文章提供了在回归模型上置换重要性测量的理论研究,这使我们能够描述相关性预测和排名的重要性之间的影响.相比于原始随机森林算法使用重要性排名做变量选择,研究结果使用了递归特征消除(RFE)方法做变量选择.通过实验证明了RFE-RF方法对机器学习算法的正确预测有很大的帮助.  相似文献   

7.
需水预测对于有效的水资源管理有重要的作用。文章引入随机森林方法对需水预测问题进行了实证研究。实验结果表明,随机森林方法不会受到训练集中异常值的影响而出现过度拟合的情况,模型稳健性较高。在地区需水量的各解释变量中,地区人口和灌溉面积的影响较为重要。文章的结论和方法有助于管理部门更有效的进行需水管理。  相似文献   

8.
文章构建了衍生金融工具风险预测的AdaBoost组合算法的单属性测试和决策树模型;详细论述了单属性测试和决策树与AdaBoost算法的分类器组合机制,同时界定了12个风险检测变量指标,运用252个我国上市公司作为初始样本,分别进行了一年、两年和三年的26次衍生金融工具风险预测的AdaBoost组合算法的单属性测试(SAT),AdaBoost组合算法的决策树(DT)、单决策树和单支持向量机(SVM)实验,结果表明,基于AdaBoost组合算法的衍生金融工具风险预测模型可以对公司衍生金融工具风险进有效的预测。  相似文献   

9.
文章综合考虑企业的财务和非财务因素,利用LASSO方法对企业财务困境预测指标进行筛选,然后使用决策树、随机森林、SVM、最近邻法这四种数据挖掘方法,以及常见的logistic模型,分别建立企业财务困境预测模型.结果表明:不能忽视非财务因素在企业财务困境预测中的作用;并非所有数据挖掘方法都优于常用的logistic模型;LASSO方法能在降维的同时保证企业财务困境预测的准确性,实现模型的精简.  相似文献   

10.
针对VaR组合预测,文章在分析固定窗、滚动窗和扩张窗三种利用样本方式的特点的基础上,利用上证综合指数数据对使用这三种利用样本方式时VaR组合预测的预测表现进行了实证分析。结论是不同的利用样本方式对VaR组合预测的预测表现有显著的影响。  相似文献   

11.
Summary.  Editing in surveys of economic populations is often complicated by the fact that outliers due to errors in the data are mixed in with correct, but extreme, data values. We describe and evaluate two automatic techniques for the identification of errors in such long-tailed data distributions. The first is a forward search procedure based on finding a sequence of error-free subsets of the error-contaminated data and then using regression modelling within these subsets to identify errors. The second uses a robust regression tree modelling procedure to identify errors. Both approaches can be implemented on a univariate basis or on a multivariate basis. An application to a business survey data set that contains a mix of extreme errors and true outliers is described.  相似文献   

12.
神经网络模型与车险索赔频率预测   总被引:1,自引:0,他引:1       下载免费PDF全文
孟生旺 《统计研究》2012,29(3):22-26
汽车保险广受社会关注,且在财产保险公司具有举足轻重的地位,因此汽车保险的索赔频率预测模型一直是非寿险精算理论和应用研究的重点之一。目前最为流行的索赔频率预测模型是广义线性模型,其中包括泊松回归、负二项回归和泊松-逆高斯回归等。本文基于一组实际的车险损失数据,对索赔频率的各种广义线性模型与神经网络模型和回归树模型进行了比较,得出了一些新的结论,即神经网络模型的拟合效果优于广义线性模型,在广义线性模型中,泊松回归的拟合效果优于负二项回归和泊松-逆高斯回归。线性回归模型的拟合效果最差,回归树模型的拟合效果略好于线性回归模型。  相似文献   

13.
Nonparametric seemingly unrelated regression provides a powerful alternative to parametric seemingly unrelated regression for relaxing the linearity assumption. The existing methods are limited, particularly with sharp changes in the relationship between the predictor variables and the corresponding response variable. We propose a new nonparametric method for seemingly unrelated regression, which adopts a tree-structured regression framework, has satisfiable prediction accuracy and interpretability, no restriction on the inclusion of categorical variables, and is less vulnerable to the curse of dimensionality. Moreover, an important feature is constructing a unified tree-structured model for multivariate data, even though the predictor variables corresponding to the response variable are entirely different. This unified model can offer revelatory insights such as underlying economic meaning. We propose the key factors of tree-structured regression, which are an impurity function detecting complex nonlinear relationships between the predictor variables and the response variable, split rule selection with negligible selection bias, and tree size determination solving underfitting and overfitting problems. We demonstrate our proposed method using simulated data and illustrate it using data from the Korea stock exchange sector indices.  相似文献   

14.
The Bayesian CART (classification and regression tree) approach proposed by Chipman, George and McCulloch (1998) entails putting a prior distribution on the set of all CART models and then using stochastic search to select a model. The main thrust of this paper is to propose a new class of hierarchical priors which enhance the potential of this Bayesian approach. These priors indicate a preference for smooth local mean structure, resulting in tree models which shrink predictions from adjacent terminal node towards each other. Past methods for tree shrinkage have searched for trees without shrinking, and applied shrinkage to the identified tree only after the search. By using hierarchical priors in the stochastic search, the proposed method searches for shrunk trees that fit well and improves the tree through shrinkage of predictions.  相似文献   

15.
Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many irrelevant variables and the number of predictors exceeds the number of observations. We propose the multistep regression tree with adaptive variable selection to handle this problem. The variable selection step and the fitting step comprise the multistep method.

The multistep generalized unbiased interaction detection and estimation (GUIDE) with adaptive forward selection (fg) algorithm, as a variable selection tool, performs better than some of the well-known variable selection algorithms such as efficacy adaptive regression tube hunting (EARTH), FSR (false selection rate), LSCV (least squares cross-validation), and LASSO (least absolute shrinkage and selection operator) for the regression problem. The results based on simulation study show that fg outperforms other algorithms in terms of selection result and computation time. It generally selects the important variables correctly with relatively few irrelevant variables, which gives good prediction accuracy with less computation time.  相似文献   

16.
This article investigates the existence of multiple regimes in the U.S. economy during the 1923—1991 period. A technique known as regression tree analysis is applied to search for splits in the data, if any exist, rather than choosing a splitting point a priori as has been done in previous work. Using this technique, strong evidence for the existence of nonlinear behavior of U.S. output is found over this period. Monte Carlo results are presented to assess the significance of the regime changes that are found.  相似文献   

17.
基于R软件rpart包的分类与回归树应用   总被引:5,自引:0,他引:5  
对于许多分类和回归问题,二叉树(Binary Tree)提供了有趣而又形象化的方式来研究数据,它主要是按照一定的规则拆分自变量,而完成对因变量的合理分类,进一步可以对未知分类进行预测。在主要介绍递归分割(Recursive Partitioning)和回归树(Regression Tree)在R软件中应用的同时,对一前列腺癌数据使用生存分析和分类与回归树相结合的方法做出分析,并得到了对于疾病诊断和预防较有指导意义的结论。  相似文献   

18.
The performance of computationally inexpensive model selection criteria in the context of tree structured prediction is discussed. It is shown through a simulation study that no one model selection criterion exhibits a uniformly superior performance over a wide range of scenarios. Therefore, a two-stage approach for model selection is suggested and shown to perform satisfactorily. A computationally efficient method of tree-growing within the RECursive Partition and AMalgamation (RECPAM) framework is suggested. The computationally efficient algorithm gives identical results as the original RECPAM tree-growing algorithm. An example of medical data analysis for developing prognostic classification is presented.  相似文献   

19.
Often, categorical ordinal data are clustered using a well-defined similarity measure for this kind of data and then using a clustering algorithm not specifically developed for them. The aim of this article is to introduce a new clustering method suitably planned for ordinal data. Objects are grouped using a multinomial model, a cluster tree and a pruning strategy. Two types of pruning are analyzed through simulations. The proposed method allows to overcome two typical problems of cluster analysis: the choice of the number of groups and the scale invariance.  相似文献   

20.
This paper obtains asymptotic representations of a class of L-estimators in a linear regression model when the errors are a function of long-range-dependent Gaussian random variables. These representations are then used to address some of the efficiency robustness properties of L-estimators compared to the least-squares estimator. It is observed that under the Gaussian error distribution, each member of the class has the same asymptotic efficiency as that of the least-squares estimator. The results are obtained as a consequence of the asymptotic uniform linearity of some weighted empirical processes based on long-range-dependent random variables.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号