共查询到20条相似文献,搜索用时 218 毫秒
1.
2.
小样本统计决策理论:统计学习理论 总被引:1,自引:0,他引:1
统计学习理论是由Vapnik等人提出的一种有限样本统计理论,是模式识别领域新近发展的一种新理论,着重研究在小样情况下的统计规律及统计决策方法性质。它为小样本统计决策问题建立了一个较好的理论框架,也发展了一种新的通用统计决策学习方法———支持向量机,较好地解决了小样本统计决策问题。本文旨在介绍统计学习理论的基本思想、特点、研究现状和一些思考。 相似文献
3.
《统计与信息论坛》2019,(5):69-78
近年来,广义线性模型已被广泛用于车险定价,而一些研究结果显示机器学习在某些方面优于广义线性模型,但这些结果都只是基于某个单一数据集。为了更全面地比较广义线性模型与机器学习方法在车险索赔频率预测问题上的效果,对7个车险数据集进行了比较测试,包括深度学习、随机森林、支持向量机、XGboost等机器学习方法;基于相同的训练集,建立不同的广义线性模型预测索赔频率,根据最小信息准则(AIC)选取最优的广义线性模型;通过交叉验证调参获得机器学习最佳参数和模型。研究结果显示:在所有的数据集上XGboost的预测效果一致地优于广义线性模型;对于某些自变量较多、变量间相关性强的数据集,神经网络、深度学习和随机森林的预测效果比广义线性模型更好。 相似文献
4.
5.
一、问题的提出关联规则挖掘是通过对已知数据的学习找出其中存在有意义依赖关系,它可以用来进行预测决策和分类,因此是机器学习领域集中解决的问题。数据挖掘中的问题与机器学习主要区别在于:数据挖掘中要处理大量的数据,因此要求学习的效率很高;另外数据挖掘获得的规则或模式 相似文献
6.
7.
基于支持向量机的组合预测法及其应用 总被引:1,自引:0,他引:1
支持向量机(Support Vector Machine)是Vapnik等人根据统计学习理论提出的一种新的通用学习方法,它是建立在统计学习理论的VC维理论和结构风险最小原理基础上的,能较好地解决小样本、非线性、高维数和局部极小点等实际问题,能提高学习机的泛化能力,已成为机器学习界的研究热点之一,并成功地应用于分类、函数逼近和时间序列预测等方面.而组合预测作为一种博采众长的预测方法愈来愈受到人们的重视和广泛应用.本文利用支持向量机的方法,构造一种新的组合预测方法,该组合预测方法具有预测精度高,泛化能力强等特点.应用此方法对河北省卫生技术人员总数进行预测,取得了很好的预测效果. 相似文献
8.
《统计研究》2020,(2):F0003-F0003
我国经济进入了"新常态"和高质量发展阶段,给经济统计研究带来了新的机遇和挑战。一方面,机器学习等数据科学方法与经济统计学的结合,改进了经济统计的研究方法;另一方面,文本、语音、图像等非结构化数据,为经济统计研究提供了新的数据来源。研究如何利用大数据、机器学习等新技术、新方法推动经济统计的发展,将会给经济统计研究注入新的活力。为了促进我国现代经济统计的研究发展,为相关研究人员提供一个交流与合作的平台,《统计研究》编辑部与厦门大学、浙江财经大学、浙江工商大学、江西财经大学、西南财经大学、暨南大学、中南财经政法大学等单位共同发起成立"数据科学与现代经济统计论坛"。 相似文献
9.
支持向量机是一种基于统计学习理论的机器学习算法,在小样本情况下亦可得到很好的分类效果。文章提出了基于支持向量机的上市公司财务危机预测模型,公司财务指标作为支持向量机的输入,其数目较多,采用主成分分析方法降低支持向量机的输入向量维数。与多元统计方法、Logit及Probit模型进行比较,结果表明,该方法预测精度高,第一类错误及第二类错误显著减小。 相似文献
10.
11.
学习型组织理论的产生,为我们更为客观、全面地理解统计人员教育的作用提供了一个新的视角。本文在论述学习型组织基本要义和重要意义的基础上,提出了创建学习型统计组织的具体思路。 相似文献
12.
The choice of a product on one purchase occasion by one consumer could be multiple varieties and influenced by past usage experience of this product. To mimic the real situation, this article proposes a new dynamic multiple-variety choice (DMC) model which incorporates quantitative and qualitative dynamics into an additive utility function. This model exhibits three major features of consumer purchase behavior: more than one variety purchased, learning behavior from use experience, and forgetting with the passage of time. All these are achieved by combining a simultaneous demand model with Bayesian learning theory embedded in an exponential function. The model is tested and validated using Hong Kong television viewing data. Empirical results show that including Bayesian learning in a multiple-choice model significantly improves model performance and prediction accuracy, and consideration of the effect of forgetting when studying learning behavior renders the Bayesian learning model much more accurate in practical application. 相似文献
13.
我国“十一五”各省区节能潜力分析 总被引:9,自引:0,他引:9
内容提要:本文基于历史序列的分析,提出了区域经济-能源消耗的“学习曲线”,运用30个省区1990-2005年的时间序列数据,建立了万元产值能耗随人均国民生产总值(GDP)增加而逐步下降的能源学习曲线,并分析并计算了“十一五”期间不同省区万元产值的节能潜力和在全国总节能目标中的分担率。结果显示:(1)山西、甘肃以及贵州等省区的万元产值的节能潜力较大,而福建、浙江、广东、江苏、北京等省区较小;(2)结合各省区经济总量和能源消耗,山东、山西、河北、辽宁等能耗大省和广东、江苏、浙江、上海等经济强省节能贡献大分担率高,而海南等省区,由于经济规模小而节能贡献相对较小,从而给出我国“十一五”期间万元产值节能潜力和节能贡献分担率的地区分布格局。 相似文献
14.
15.
This paper develops a general approach to quantifying the size of generalization errors for margin-based classification. A trade-off between geometric margins and training errors is exhibited along with the complexity of a binary classification problem. Consequently, this results in dealing with learning theory in a broader framework, in particular, of handling both convex and non-convex margin classifiers, among which includes, support vector machines, kernel logistic regression, and ψ-learning. Examples for both linear and nonlinear classifications are provided. 相似文献
16.
17.
Ontology learning has become a major area of research within the current ontology engineering. The acquisition of concepts and taxonomic relations is one of the key issues of ontology learning. There are various taxonomic ways for the same domain and the related concepts. However, current ontology learning tools have not considered how to generate the most suitable taxonomy for the specific application. In this article, we present a new supervised approach for taxonomic relationship learning. Taking the Chinese unstructured text on the Web as the corpus and a certain particular domain application as the guide, this method uses a given branch of the target ontology and integrates multi-strategy which includes shallow syntactic analysis and kinds of statistical measures to extract concepts and to determine the concepts’ hierarchy numbers and is-a relations. The experiments using a working prototype of the system revealed the improved performance of domain ontology taxonomic relations learning. 相似文献
18.
针对传统BP学习算法收敛速度慢、对步长依赖明显等缺点,提出一种利用搜索较优步长的BP算法。其在网络训练中,能够在每次迭代中搜索出一个相对合理的步长,从而使步长的选择对学习速度的影响大大降低。对经济预测仿真结果表明,新算法对步长选择的依赖性小于传统BP算法。 相似文献
19.
Distance learning can be useful for bridging geographical barriers to education in rural settings. However, empirical evidence on the equivalence of distance education and traditional face-to-face (F2F) instruction in statistics and biostatistics is mixed. Despite the difficulty in randomization, we minimized intra-instructor variation between F2F and online sections in seven graduate-level biostatistics service courses in a synchronous (live, real time) fashion; that is, for each course taught in a traditional F2F setting, a separate set of students were taught simultaneously via online learning technology, allowing for two-way interaction between instructor and students. Our primary objective was to compare student performance in the two courses that use these two teaching modes. We used a Bayesian hierarchical model to test equivalence of modes. The frequentist mixed model approach was also conducted for reference. The results of Bayesian and frequentist methods agree and suggest a difference of less than 1% in average final grades. Finally, we discuss barriers to instruction and learning using the applied online teaching technology. 相似文献
20.
This article describes a new approach to learning curve estimation. Our approach is to formulate statistical procedures that conform to alternative learning curve theories. This leads to the development of nonlinear statistical models of the learning curves. For the three data sets analyzed, autocorrelation seems to be an important problem. Parameter estimates were derived using the maximum likelihood principle in the presence of first-order autocorrelation. Nonnested tests were used to select the appropriate formulation of the learning curve. Research conclusions are to use unit data when estimating a learning curve and to be prepared to treat autocorrelation if present. 相似文献