首页 | 本学科首页   官方微博 | 高级检索  
     检索      

数据分类中的特征选择算法研究
引用本文:赵宇,黄思明,陈锐.数据分类中的特征选择算法研究[J].中国管理科学,2013,21(6):38-46.
作者姓名:赵宇  黄思明  陈锐
作者单位:中国科学院科技政策与管理科学研究所, 北京 100190
基金项目:国家科技支撑计划项目(2012BAK27B02);中国科学院科技政策与管理科学研究所重大研究专项B类(Y201161Z04)
摘    要:本文应用半正定规划支持向量机模型,将核函数特征子空间的组合作为核映射矩阵,提出一种新的将特征选择整合在数据分类过程中的学习算法。首先,将样本按其特征进行分组,计算每组样本子集的核矩阵;然后将这些核矩阵线性组合后加入基于半正定规划的支持向量机模型中,利用半正定规划支持向量机学习器求解得到各子特征空间的权重系数,其次,根据特征权重系数建立特征贡献度和支持度用于特征选择并控制分类准确率、特征数量和对不同类别样本的分类能力;最后根据最优分类准确率、最少特征数量、最佳泛化能力三项不同目标计算所对应的特征数量和分类结果。实证中采用医学、植物学、文本识别和信用等领域数据以及人工数据集比较该方法和SFS、Relief-F以及SBS算法的特征选择效果。结果表明,在实际数据中,本文提出的方法不但能够保持较好的分类学习效果,而且可以比SFS、Relief-F以及SBS特征选择算法的特征子集数目大幅减少;在人工数据中,该方法可以正确地选出真正的特征,去除噪声特征。

关 键 词:数据挖掘  特征选择  分类算法  核矩阵  半正定规划  
收稿时间:2011-03-30
修稿时间:2013-04-30

Research on Feature Selection Methods of Data Classification
ZHAO Yu,HUANG Si-ming,CHEN Rui.Research on Feature Selection Methods of Data Classification[J].Chinese Journal of Management Science,2013,21(6):38-46.
Authors:ZHAO Yu  HUANG Si-ming  CHEN Rui
Institution:Institute of Policy & Management, Chinese Academy of Sciences, Beijing 100190, China
Abstract:By applying support vector machine(SVM) learner, based on semi-definite programming, a new ensembled feature selection within optimization data classification method is draw out to achieve the global optimal feature selection and data classification simultaneously. Firstly, the features are divided into several groups and each sub-feature space kernel matrix is calculated. Then linear combination of these sub-feature kernel matrix for the semi-definite SVM kernel mapping is constructed, getting all the linear weight coefficients from the make global model solving. The classification rate is dominated by the contribution and support educed by the weighted coefficients which can choose the maximum rate, minimal features or generalization ability. Finally, the classification rate and number of features are counted based on these three objectives. For verification purpose, medical, botanical, text recognition, artificial and credit datasets are used for comparing the advantage among SFS, Relief-F, SBS and the ensembled method. Results indicate that the ensembled method can not only obtain better learning efficient but also reduce the features more sharply than SFS, Relief-F, and SBS in some datasets.
Keywords:data mining  feature selection  data classification  kernel matrix  semidefinite programming  
点击此处可从《中国管理科学》浏览原始摘要信息
点击此处可从《中国管理科学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号