首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于中文文本主题提取的分词方法研究
引用本文:唐培丽,胡明,张勇.基于中文文本主题提取的分词方法研究[J].吉林工程技术师范学院学报,2005(2).
作者姓名:唐培丽  胡明  张勇
作者单位:长春工业大学计算机科学与工程学院 吉林长春130012 (唐培丽,胡明),吉林工程技术师范学院信息工程学院 吉林长春130052(张勇)
摘    要:中文文本主题提取有助于用户对海量信息进行浓缩和提炼。而主题提取是以中文分词作为第一步,分词质量直接影响到文献主题提取的质量。本文提出了一种基于中文文本主题提取的分词方法,该方法以概念语义网络为分词词典,采用改进的最大匹配算法对文本进行切词,并同步完成主题词的规范工作。

关 键 词:主题提取  中文分词  最大匹配算法  歧义切分

Research of algorithm from Chinese texts thematic words extraction based on semantic
Tang Peili,Hu Ming,Zhang Yong.Research of algorithm from Chinese texts thematic words extraction based on semantic[J].Journal of Jilin Teachers Institute of Engineering and Technology(Natural Sciences Edition),2005(2).
Authors:Tang Peili  Hu Ming  Zhang Yong
Institution:Tang Peili1,Hu Ming1,Zhang Yong2
Abstract:Chinese text subject extraction is helpful for user to condense vast amounts of information. Chinese word segmentation is always the first step of subject extraction. The quality of word segmentation is effective to the quality of text subject extraction. This paper puts forward a word segmentation method based on text subject extraction. It uses an improved MM segmentation algorithm and constructs concept semantic network as dictionary, and standardizes thematic words. In theory, it is superior to other algorithms for Chinese word segmentation on ambiguity partition.
Keywords:subject extraction  Chinese word segmentation  maximum matching method  ambiguity partition
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号