首页 | 本学科首页   官方微博 | 高级检索  
     检索      

多特征结合的中文垃圾邮件过滤特征选择方法研究
引用本文:赵俊生,苏依拉.多特征结合的中文垃圾邮件过滤特征选择方法研究[J].内蒙古工业大学学报,2013(3):209-213.
作者姓名:赵俊生  苏依拉
作者单位:内蒙古工业大学信息工程学院,内蒙古呼和浩特010080
基金项目:内蒙古工业大学基金项目(X200801).
摘    要:在中文垃圾邮件过滤系统中,基于内容过滤的Na?ve Bayes算法得到了广泛应用。本文将多种特征结合构建邮件文本向量,应用八种文本分类特征选择方法在Na?ve Bayes算法上进行实验验证,通过准确率和召回率结合的综合性能指标F1值进行性能评价,结果表明,采用类别区分词、优势率、信息增益、期望交叉熵、CHI统计和文本证据权等六种特征选择方法应用于多特征结合邮件文本向量的过滤取得了较好的垃圾邮件过滤性能,反垃圾邮件效果较好。

关 键 词:邮件向量  垃圾邮件过滤  特征选择  Naive  Bayes算法  F1  

The Research of the Chinese Spam Filtering Feature Selection Method by Multi -feature Combination
ZHAO Junsheng,SU Yila.The Research of the Chinese Spam Filtering Feature Selection Method by Multi -feature Combination[J].Journal of Inner Mongolia Polytechnic University(Social Sciences Edition),2013(3):209-213.
Authors:ZHAO Junsheng  SU Yila
Institution:( The College of Information Engineering, Inner Mongolia University of Technology, Hohhot 010080)
Abstract:In Chinese spam filtering system , Naive Bayes algorithm based on content filtering has been widely used .The combination of characteristics to build the e -mail text vector by eight text classification feature selection method is applied to Naive Bayes algorithm for experimental verification .The performance evaluation results show that six feature selection methods , category distinguish words , odds ratio, information gain, expected cross entropy , CHI statistical and textual evidence weight , is made good spam filtering performance to the multi -feature combined with the text vector filtering by the comprehensive performance indicator F1 value of precision and recall rate , and anti-spam effect is good .
Keywords:mail vector  spam filtering  feature selection  Naive Bayes algorithm  F1 value
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号