文本聚类中罚多项混合模型的特征选择及其在互联网舆情分析中的应用 Penalized Multinomial Mixture Model with Feature Selection for Text Clustering and its Application on Internet Public Opinions期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

文本聚类中罚多项混合模型的特征选择及其在互联网舆情分析中的应用

引用本文：	薛薇,陈欢歌.文本聚类中罚多项混合模型的特征选择及其在互联网舆情分析中的应用[J].统计与信息论坛,2012(1):9-14.

作者姓名：	薛薇陈欢歌

作者单位：	中国人民大学应用统计科学研究中心;中国人民大学统计学院;北京培黎职业学院计算机系

摘要：	高维稀疏数据的特征选择是互联网舆情文本聚类分析的关键。借鉴罚模型思想,利用罚多项混合模型,给不显著影响聚类结果的特征予较重惩罚的方式实现特征选择,可有效选出代表舆情各类观点的典型词汇,实证应用中有较为理想的表现。
关键词：	特征选择混合模型文本聚类舆情分析
Penalized Multinomial Mixture Model with Feature Selection for Text Clustering and its Application on Internet Public Opinions

XUE Wei,a,b,CHEN Huan-ge.Penalized Multinomial Mixture Model with Feature Selection for Text Clustering and its Application on Internet Public Opinions[J].Statistics & Information Tribune,2012(1):9-14.

Authors:	XUE Wei a b CHEN Huan-ge

Institution:	1.a.Center for Applied Statistics;b.Statistics School,Renmin University of China,Beijing 100872,China; 2.The Department of Computer Science,Bailie University of Beijing,Beijing 100085,China)

Abstract:	It is the key point that feature selection on high-dimensional sparse text data during the clustering of internet public opinions.By giving heavier penalty to the features having no significance to the result of clustering,penalized multinomial mixture model based on norm penalty idea can select the typical phrases on behalf of various types of public opinions effectively.This model has relatively better performance than the empirical research.

Keywords:	feature selection mixture model text clustering public opinions analysis
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏