首页 | 本学科首页   官方微博 | 高级检索  
     检索      

文本聚类中罚多项混合模型的特征选择及其在互联网舆情分析中的应用
引用本文:薛薇,陈欢歌.文本聚类中罚多项混合模型的特征选择及其在互联网舆情分析中的应用[J].统计与信息论坛,2012(1):9-14.
作者姓名:薛薇  陈欢歌
作者单位:中国人民大学应用统计科学研究中心;中国人民大学统计学院;北京培黎职业学院计算机系
摘    要:高维稀疏数据的特征选择是互联网舆情文本聚类分析的关键。借鉴罚模型思想,利用罚多项混合模型,给不显著影响聚类结果的特征予较重惩罚的方式实现特征选择,可有效选出代表舆情各类观点的典型词汇,实证应用中有较为理想的表现。

关 键 词:特征选择  混合模型  文本聚类  舆情分析

Penalized Multinomial Mixture Model with Feature Selection for Text Clustering and its Application on Internet Public Opinions
XUE Wei,a,b,CHEN Huan-ge.Penalized Multinomial Mixture Model with Feature Selection for Text Clustering and its Application on Internet Public Opinions[J].Statistics & Information Tribune,2012(1):9-14.
Authors:XUE Wei  a  b  CHEN Huan-ge
Institution:1.a.Center for Applied Statistics;b.Statistics School,Renmin University of China,Beijing 100872,China; 2.The Department of Computer Science,Bailie University of Beijing,Beijing 100085,China)
Abstract:It is the key point that feature selection on high-dimensional sparse text data during the clustering of internet public opinions.By giving heavier penalty to the features having no significance to the result of clustering,penalized multinomial mixture model based on norm penalty idea can select the typical phrases on behalf of various types of public opinions effectively.This model has relatively better performance than the empirical research.
Keywords:feature selection  mixture model  text clustering  public opinions analysis
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号