首页 | 本学科首页   官方微博 | 高级检索  
     

快速的文本倾向性分类方法(英文)
引用本文:李艳玲,戴冠中,覃森. 快速的文本倾向性分类方法(英文)[J]. 电子科技大学学报(社会科学版), 2007, 0(6)
作者姓名:李艳玲  戴冠中  覃森
作者单位:西北工业大学自动化学院,西北工业大学自动化学院,西北工业大学自动化学院 西安710072 西安高技术研究所西安710025,西安710072,西安710072
基金项目:国家863计划项目(2005AA147030)~~
摘    要:提出了一种快速的文本倾向性分类方法,即采用类别空间模型描述词语对类别的倾向性,基于词的统计特征实现分类;针对倾向性分类的复杂性,在综合考虑词频、词的文本频、词的分布三种统计特征的基础上,提出一种新的二次特征提取方法:第一次特征提取,采用组合特征提取方法,除去低频词以及在各类中均匀分布的噪音词;第二次特征提取,去除类别倾向性不明显的词。实验表明该分类方法不仅具有较高的分类性能,而且运行速度快,在信息检索、信息过滤、内容安全管理等方面具有一定的实用价值。

关 键 词:类别权重  类别空间模型  文本倾向性分类  二次特征提取

A Rapid Method for Text Tendency Classification
LI Yan-ling,,DAI Guan-zhong,QIN Sen. A Rapid Method for Text Tendency Classification[J]. Journal of University of Electronic Science and Technology of China(Social Sciences Edition), 2007, 0(6)
Authors:LI Yan-ling    DAI Guan-zhong  QIN Sen
Affiliation:LI Yan-ling1,2,DAI Guan-zhong1,QIN Sen1
Abstract:A rapid method for text tendency classification is proposed in this paper. By means of class space model to display the tendency of the words to the categories, the method realizes the classification based on the statistic characteristics of words. In this method, through the studies of the complexity of text tendency categorization, three statistic characteristics of word such as frequency, document frequency and the distribution of words are comprehensively taken into account, and a new method of twice feature selection is proposed:In the first characteristic selection process, using combination characteristic selection method, the words that those distributions are uniform in each category and the low-frequency words are deleted. Then in the second process, the words that those category tendencies are not obvious are deleted. The experimental results show that the algorithm is running-fast, and has high performance.
Keywords:category weight  class space model  text tendency categorization  twice feature selection
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号