期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

陈凯马景义《统计教育》2008,(9):36-40

集成算法已经成为机器学习研究的一大热点,已有许多改进的集成算法,但对"病态"数据的集成研究并不常见。本文通过对一海藻繁殖案例的研究,提出了一种基于块状bootstrap技术的集成算法,并将其与几种常用的集成算法比较研究得出,在对于一些"病态"数据而言,该算法往往比其它算法具有更小的模型推广误差和更高的预测精度的优点。相似文献

2.

一种基于k—means聚类技术的快速选择性BaggingTrees集成算法研究

陈凯温慧博《统计与信息论坛》2008,23(9):23-27

选择性集成算法是目前机器学习关注的热点之一。在对一海藻繁殖案例研究的基础上，提出了一种基于k—nleanS聚类技术的快速选择性BaggingTre咚集成算法；同时与传统统计方法和一些常用的机器学习方法相比较，发现该算法具有较小的模型推广误差和更高的预测精度的优点，而且其运行的效率也得到了较大的提高。相似文献

3.

一种基于差异思想的选择性Bagging Trees集成算法研究

陈凯朱钰王征《统计教育》2008,(6):24-28

本文通过对鸢尾花数据的研究，提出了一种基于分类器的分类效果差异而进行快速选择的一种改进的Bagging Trees集成算法。并通过同其他统计机器学习方法，如：CART、Bagging Trees、Random Forest以及目前流行的基于遗传算法的选择性集成算法GASEN等比较得出，该算法对于分类问题而言，具有较高的准确率，而且与GASEN算法相比，运行的效率也得到了较大的提高。相似文献

4.

数据流概念漂移双窗探测方法

范瑞李星野《统计与决策》2008,(17)

数据挖掘(机器学习)领域的研究重点是建立概念漂移数据(Concept-drift)下的模型,其中的核心问题就是探测器算法.文章提出了一种基于双窗的探测算法.其优点是给出了该算法的严格理论基础;有效提高挖掘效率,克服虚漂移的干扰.并且运用人工和实际数据进行实验,效果亦优于其他算法. 相似文献

5.

大数据风控有效吗?——基于统计评分卡与机器学习模型的对比分析

《统计与信息论坛》2019,(9):18-26

随着金融科技的巨大进步,机器学习在金融风控领域的应用也逐渐深化起来。信用评分卡模型作为一种应用最为广泛的风险评估模型,在大数据时代存在着不能对高维、复杂、非线性的个人征信数据进行全面分析的局限性。从中国的互联网金融发展的实际情况出发,提出一种基于XGBoost机器学习算法的互联网金融风控模型,并与传统的统计评分卡模型进行了对比试验,同时给出了将机器学习模型预测结果转化为传统信用评分的解决方法。研究结果表明,机器学习模型能更好地预测个人信用风险,从而构建更加有效的风控体系。相似文献

6.

两分类不平衡数据的Boosting算法

宋捷吕晓玲吴喜之《统计与决策》2010,(10)

Boosting算法是一类串行的集成算法,可用于分类和回归。不同的算法由不同的损失与不同的集成方式构成。文章提出了一种自适应地处理分类中的不平衡数据的Boosting算法Baboost。实验证明该算法能有效地减小各个类内部的预测误差。相似文献

7.

机器学习及其相关算法综述 总被引：3，自引：0，他引：3

陈凯朱钰《统计与信息论坛》2007,22(5):105-112

自从计算机被发明以来,人们就想知道它能不能学习。机器学习从本质上是一个多学科的领域。它吸取了人工智能、概率统计、计算复杂性理论、控制论、信息论、哲学、生理学、神经生物学等学科的成果。文章主要从统计学习基础的角度对机器学习的发展历程以及一些相关的常用算法进行了简要的回顾和介绍。相似文献

8.

信用评分模型中拒绝推断问题研究：基于半监督协同训练法的改进

黎春周振宇《统计研究》2019,36(9):82

随着我国金融市场的蓬勃发展,信用评价中的拒绝推断问题越来越受到重视。针对信用评分模型中存在的有类别标签的样本占比低,并且样本中的类别分布不平衡等问题,本文在半监督学习技术与集成学习理论的基础上,提出了一种新的算法——BCT算法。该算法通过使用动态Bagging生成多个子分类器,引入分类阈值参数来解决样本类别分布不平衡问题,以及设定早停止条件来避免算法迭代过程中存在的过拟合风险,以此对传统半监督协同训练法进行改进。通过在5个真实数据集上的实证分析发现,在不同数据集与不同拒绝比例下,BCT算法的性能均优于其他6种有监督学习和半监督学习算法的信用评分模型,显示了BCT算法具有良好的模型泛化性能和更高的模型评价能力。相似文献

9.

基于旋转扰动的支持向量机隐私保护算法

刘洪伟石雅强叶珊珊梁周扬《统计与决策》2012,(19):94-96

支持向量机(SVM)是数据挖掘中非常流行的分类算法,得到了广泛的关注。数据泄露问题日渐凸显,数据挖掘中的隐私保护也成为当今研究热点,但是针对SVM隐私保护的研究较少。我们提出了基于旋转扰动的SVM隐私保护算法,该算法引入正交旋转变换方法,具有分类零损失的特性。文章采用传统数据安全评价方法,并利用UCI机器学习中心提供的数据对该算法的隐私性水平进行了分析。理论验证和实验结果表明,我们提出了令人满意的SVM隐私保护算法。相似文献

10.

基于小波支持向量机的经济预测模型 总被引：2，自引：0，他引：2

潘菁刘辉煌《统计与决策》2005,(21):14-15

最近,由Vapnik等提出的统计学习理论及从中发展出的支持向量机(Sup-port Vector Machines,SVM)方法,在回归算法的研究中表现出极好的性能,被认为是神经网络的替代方法,目前在时间序列预测领域也开始得到应用.SVM无论在理论还是在实践中,在非线性时间序列预测领域都具有优秀的表现和应用前景.本文将小波理论与SVM方法结合起来,互补二者优势,提出了一种称为小波支持向量机(Wavelet Support VectorMachines,WSVM)的新的机器学习方法.该方法引入小波基函数来构造SVM的核函数,得到了一种新的SVM模型,它除了具有SVM的一切优点外,还能消除数据的高频干扰,具备良好的抗噪能力.本文将这一新方法应用于经济预测中,得到了较高的预测精度,表明WSVM方法是一种很有潜力的机器学习方法. 相似文献

11.

The ensemble Kalman filter is an ABC algorithm 总被引：1，自引：0，他引：1

David J. Nott Lucy Marshall Tran Minh Ngoc 《Statistics and Computing》2012,22(6):1273-1276

The ensemble Kalman filter is the method of choice for many difficult high-dimensional filtering problems in meteorology, oceanography, hydrology and other fields. In this note we show that a common variant of the ensemble Kalman filter is an approximate Bayesian computation (ABC) algorithm. This is of interest for a number of reasons. First, the ensemble Kalman filter is an example of an ABC algorithm that predates the development of ABC algorithms. Second, the ensemble Kalman filter is used for very high-dimensional problems, whereas ABC methods are normally applied only in very low-dimensional problems. Third, recent state of the art extensions of the ensemble Kalman filter can also be understood within the ABC framework. 相似文献

12.

An ensemble method for concept drift in nonstationary environment

Dhouha Mejri Riadh Khanchel Mohamed Limam 《Journal of Statistical Computation and Simulation》2013,83(6):1115-1128

Most statistical and data-mining algorithms assume that data come from a stationary distribution. However, in many real-world classification tasks, data arrive over time and the target concept to be learned from the data stream may change accordingly. Many algorithms have been proposed for learning drifting concepts. To deal with the problem of learning when the distribution generating the data changes over time, dynamic weighted majority was proposed as an ensemble method for concept drift. Unfortunately, this technique considers neither the age of the classifiers in the ensemble nor their past correct classification. In this paper, we propose a method that takes into account expert's age as well as its contribution to the global algorithm's accuracy. We evaluate the effectiveness of our proposed method by using m classifiers and training a collection of n-fold partitioning of the data. Experimental results on a benchmark data set show that our method outperforms existing ones. 相似文献

13.

Propensity score prediction for electronic healthcare databases using super learner and high-dimensional propensity score methods

Cheng Ju Mary Combs Samuel D. Lendle Jessica M. Franklin Richard Wyss Sebastian Schneeweiss 《Journal of applied statistics》2019,46(12):2216-2236

ABSTRACT

The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a ‘library’ of candidate prediction models. While SL has been widely studied in a number of settings, it has not been thoroughly evaluated in large electronic healthcare databases that are common in pharmacoepidemiology and comparative effectiveness research. In this study, we applied and evaluated the performance of SL in its ability to predict the propensity score (PS), the conditional probability of treatment assignment given baseline covariates, using three electronic healthcare databases. We considered a library of algorithms that consisted of both nonparametric and parametric models. We also proposed a novel strategy for prediction modeling that combines SL with the high-dimensional propensity score (hdPS) variable selection algorithm. Predictive performance was assessed using three metrics: the negative log-likelihood, area under the curve (AUC), and time complexity. Results showed that the best individual algorithm, in terms of predictive performance, varied across datasets. The SL was able to adapt to the given dataset and optimize predictive performance relative to any individual learner. Combining the SL with the hdPS was the most consistent prediction method and may be promising for PS estimation and prediction modeling in electronic healthcare databases. 相似文献

14.

A novel bagging approach for variable ranking and selection via a mixed importance measure

Chun-Xia Zhang Jiang-She Zhang Guan-Wei Wang Nan-Nan Ji 《Journal of applied statistics》2018,45(10):1734-1755

At present, ensemble learning has exhibited its great power in stabilizing and enhancing the performance of some traditional variable selection methods such as lasso and genetic algorithm. In this paper, a novel bagging ensemble method called BSSW is developed to implement variable ranking and selection in linear regression models. Its main idea is to execute stepwise search algorithm on multiple bootstrap samples. In each trial, a mixed importance measure is assigned to each variable according to the order that it is selected into final model as well as the improvement of model fitting resulted from its inclusion. Based on the importance measure averaged across some bootstrapping trials, all candidate variables are ranked and then decided to be important or not. To extend the scope of application, BSSW is extended to the situation of generalized linear models. Experiments carried out with some simulated and real data indicate that BSSW achieves better performance in most studied cases when compared with several other existing methods. 相似文献

15.

The relative performance of ensemble methods with deep convolutional neural networks for image classification

Cheng Ju Aurélien Bibaut Mark van der Laan 《Journal of applied statistics》2018,45(15):2800-2818

Artificial neural networks have been successfully applied to a variety of machine learning tasks, including image recognition, semantic segmentation, and machine translation. However, few studies fully investigated ensembles of artificial neural networks. In this work, we investigated multiple widely used ensemble methods, including unweighted averaging, majority voting, the Bayes Optimal Classifier, and the (discrete) Super Learner, for image recognition tasks, with deep neural networks as candidate algorithms. We designed several experiments, with the candidate algorithms being the same network structure with different model checkpoints within a single training process, networks with same structure but trained multiple times stochastically, and networks with different structure. In addition, we further studied the overconfidence phenomenon of the neural networks, as well as its impact on the ensemble methods. Across all of our experiments, the Super Learner achieved best performance among all the ensemble methods in this study. 相似文献

16.

MulticlusterKDE: a new algorithm for clustering based on multivariate kernel density estimation

D. Scaldelai L. C. Matioli S. R. Santos M. Kleina 《Journal of applied statistics》2022,49(1):98

In this paper, we propose the MulticlusterKDE algorithm applied to classify elements of a database into categories based on their similarity. MulticlusterKDE is centered on the multiple optimization of the kernel density estimator function with multivariate Gaussian kernel. One of the main features of the proposed algorithm is that the number of clusters is an optional input parameter. Furthermore, it is very simple, easy to implement, well defined and stops at a finite number of steps and it always converges regardless of the data set. We illustrate our findings by implementing the algorithm in R software. The results indicate that the MulticlusterKDE algorithm is competitive when compared to K-means, K-medoids, CLARA, DBSCAN and PdfCluster algorithms. Features such as simplicity and efficiency make the proposed algorithm an attractive and promising research field that can be used as basis for its improvement and also for the development of new density-based clustering algorithms. 相似文献

17.

High-dimensional Canonical Forest

Yu-Chuan Chen James J. Chen 《Journal of Statistical Computation and Simulation》2017,87(5):845-854

Recently, a new ensemble classification method named Canonical Forest (CF) has been proposed by Chen et al. [Canonical forest. Comput Stat. 2014;29:849–867]. CF has been proven to give consistently good results in many data sets and comparable to other widely used classification ensemble methods. However, CF requires an adopting feature reduction method before classifying high-dimensional data. Here, we extend CF to a high-dimensional classifier by incorporating a random feature subspace algorithm [Ho TK. The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell. 1998;20:832–844]. This extended algorithm is called HDCF (high-dimensional CF) as it is specifically designed for high-dimensional data. We conducted an experiment using three data sets – gene imprinting, oestrogen, and leukaemia – to compare the performance of HDCF with several popular and successful classification methods on high-dimensional data sets, including Random Forest [Breiman L. Random forest. Mach Learn. 2001;45:5–32], CERP [Ahn H, et al. Classification by ensembles from random partitions of high-dimensional data. Comput Stat Data Anal. 2007;51:6166–6179], and support vector machines [Vapnik V. The nature of statistical learning theory. New York: Springer; 1995]. Besides the classification accuracy, we also investigated the balance between sensitivity and specificity for all these four classification methods. 相似文献