首页 | 本学科首页   官方微博 | 高级检索  
     

基于语料和基于标引经验的自动分类模式比较
引用本文:薛春香,夏祖奇,侯汉清. 基于语料和基于标引经验的自动分类模式比较[J]. 南京农业大学学报(社会科学版), 2005, 5(4): 85-92
作者姓名:薛春香  夏祖奇  侯汉清
作者单位:1. 南京农业大学,信息科技学院,江苏,南京,210095
2. 趋势科技中国研发中心,江苏,南京,210008
基金项目:南京农业大学信息科技学院院级资助项目,国家社会科学基金资助项目(02BTQ012)
摘    要:从原理、系统设计、知识库构建、分类算法、性能等方面对自动分类的两种模式———基于训练语料和基于人工标引经验———进行介绍和比较,这两种分类模式都具有一定的可行性。基于训练语料的自动分类模式完全依靠机器学习从训练集中发现类目的特征,数学论证充分,易于维护,比较适合于面向行业和主题的粗分类,但是过分强调了机器学习的效能,忽略了人工智力劳动成果的利用,训练过程和分类过程运算量,算法复杂;而基于标引经验的自动分类模式利用简单的统计学方法从书目数据库中挖掘人工标引经验,适用于面向体系分类法的详细分类,分类算法简单,运算量小,但过分依赖经验,缺乏有说服力的数学证明。知识库的完备性和合理性是影响两者分类效能的主要因素,是两者面临的共同问题。

关 键 词:自动分类  自动标引  语料库  知识库  《中国图书馆分类法》
文章编号:1671-7465(2005)04-0085-08
修稿时间:2005-10-21

A comparison of automatic classification between corpus-based model and experiences-based model
XUE Chun-xiang,XIA Zu-qi,HOU Han-qing. A comparison of automatic classification between corpus-based model and experiences-based model[J]. Journal of Nanjing Agricultural University(Social Science Edition), 2005, 5(4): 85-92
Authors:XUE Chun-xiang  XIA Zu-qi  HOU Han-qing
Affiliation:XUE Chun-xiang~1,XIA Zu-qi~2,HOU Han-qing~1
Abstract:This article describes and compares two models of automatic classification-corpus-based model and experience-based model-from the aspects of theory,system design,construction of knowledge base,algorithm of classification and performance.By comparison,we can draw a conclusion that both models have the feasibility to automated classification for Chinese information.The former entirely depends on machine and large-scale corpora to construct the knowledge base,so it has enough mathematic proof and is easy to be automatically maintained by machine;but the amount of calculation is huge and the scheme of classification is broad and simple,with only two or three levels.The latter entirely depends on indexing experiences to construct the knowledge base by the method of statistics,the algorithm of classification is simple and the level of classification is deep; but it is lacking in reasonable mathematic proof.The integrality and rationality of the knowledge base directly influences the result of classification,so how to ensure it is the most important problem of both.
Keywords:automatic classification  automatic indexing  text corpus  knowledge base  Chinese Library Classification
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号