首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于SVM求解不均衡数据集分类的主观权重约束方法
引用本文:刁翠霞,陈思凤,刘业政.基于SVM求解不均衡数据集分类的主观权重约束方法[J].管理工程学报,2012,26(3):146-150.
作者姓名:刁翠霞  陈思凤  刘业政
作者单位:1. 过程优化与智能决策教育部重点实验室,安徽合肥,230009
2. 合肥工业大学管理学院,安徽合肥230009;过程优化与智能决策教育部重点实验室,安徽合肥230009
摘    要:支持向量机(SVM)的二类分问题中针对不平衡数据集可以通过减少样本信息的不对称性和改进算法来解决。本文针对中小企业中有财务风险与无财务风险样本的不平衡性问题,使用一种带有主观权重约束条件的支持向量机新模型对样本进行分类。实验表明新模型确实提高了财务风险企业即少类样本的识别性能,是一种类别不均衡学习(class imbalance learning)的新方法。

关 键 词:主观权重约束  不均衡数据集  客观权重  融合法向量  SVM

An Improved SVM Method with the Subjective Weights Constraint for Classifications on Imbalanced Datasets
DIAO Cui-xia , CHEN Si-feng , LIU Ye-zheng.An Improved SVM Method with the Subjective Weights Constraint for Classifications on Imbalanced Datasets[J].Journal of Industrial Engineering and Engineering Management,2012,26(3):146-150.
Authors:DIAO Cui-xia  CHEN Si-feng  LIU Ye-zheng
Institution:1,2(1.School of Management,Hefei University of Technology,Hefei 230009,China; 2.Key Laboratory of Process Optimization and Intelligent Decision-making,Ministry of Education,Hefei 230009,China)
Abstract:Imbalanced datasets are prevalent in management practice.Developing new methods to deal with imbalanced datasets has been one of the most challenging problems in the field of business intelligence.SVM is a classification method based on statistical learning theory.Although SVM has excellent performance when dealing with general dataset,there is still room for improvement to analyze imbalanced datasets.Two strategies can be used when SVM is employed to classify imbalanced datasets,one is to reduce the asymmetry of the sample information and the other is to improve the existing SVM algorithms.This paper uses the latter strategy and presents a new SVM algorithm.The experiment shows that compared with the existing SVM the proposed SVM algorithm would have better performance to recognize rare cases in the imbalanced datasets. When employed the proposed SVM algorithm to solve the linearly separable binary classification problems,SVM constructs a hyper plane whose normal vector could be considered as the objective weights of the corresponding attributes.The larger the absolute value of component,the more influence the corresponding attribute has on decision.Inspired by the integration method of subjective and objective weights in decision theory,this paper first converts the empirical value range of subjective weights to the upper and lower limits of the normal vector,and gets a constraint inequality.We then add the inequality to the constraints of quadratic programming of SVM model.A standard form of the quadratic programming,which is called the SVM method subject to subjective weights,can be derived by Lagrange Equation.We use the proposed model in the practice of financial early-warning for SMEs where datasets are often imbalanced.An approximate linearly separable SVM model is developed to recognize the SMEs with financial risk.We downloaded the financial data of 130 SMEs(including 30 risk SMEs which have been marked by ST) in 2007 from the CSMAR database and selected 12 financial attributes as indicators.First,we trained the dataset using the approximate linearly separable SVM model.If wj<0,the values of attribute j are set as their unary negation forms.The dataset was trained again and we obtained the value of 12∑j=1 wj.Second,we obtained the subjective weight ws(0≤ws≤1),its standard deviation σ,and inequality B=(ws+λσ)12∑j=1wj≥w≥(ws-λσ)12∑j=1 wj=A(λ≥0)by simulation based on the dataset.Third,we obtained the integrated vector w′by introducing the value of A and B into the standard form of the proposed SVM method and used the quadprog function in MATLAB to train and test the dataset. Because the increase of variable λ might result in a negative value of the component aj in vector A,we conducted two different experiments to test the robustness of the proposed model.In the first experiment,aj<0 was allowed.In the second experiment,the value ofajwas set to be 0 if the actual value ofajis negative.Our experiment showed that for any arbitraryw′j,its fluctuant trends in the two types of experiments are basically the same when the constraint effect of the subjective weights becomes weaker and weaker.As the restriction is broadened in some degree,the values of variable w′ in the two experiments are the same,they are both equal to the value of w resolved by the existing SVM method which does not have the restriction of subjective weights.The experiments showed that whenλ=2.3,the proposed method can recognize 11 risk SMEs which is one time more than the existing SVM method.The recognition recall for rare cases is 20% more than that of the existing method. This paper proposes a new SVM method to achieve good recognition performance for rare cases through reducing the performance for majority cases.The proposed classification method is effective on imbalanced dataset.
Keywords:constraint of subjective weights  imbalanced dataset  objective weights  integrated vector  SVM
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号