首页 | 本学科首页   官方微博 | 高级检索  
     

论语料库用现代蒙古文标注规范
引用本文:通拉嘎. 论语料库用现代蒙古文标注规范[J]. 内蒙古民族大学学报(社会科学版), 2014, 0(4): 35-40
作者姓名:通拉嘎
作者单位:泉州师范学院图书馆,福建泉州,362000
基金项目:国家社会科学基金项目“中国少数民族语言互联网络发展状况的研究”(项目编号11CYY016)研究成果之一。
摘    要:目前现代蒙古语语料库的标注虽然有《现代蒙古语语料库标注规范》为指导,但该规范在非蒙古文字符、专有名词、外来词的转写规则方面存在一定空白,非单字单位的标注问题尚未细化。本规范以信息处理用为目的,根据现代蒙古语的特点及规律,研究蒙古文语料库标记单位的合并或切分规则。本标注规范的研究遵循了中国电子技术标准化研究所等单位联合起草的《信息处理用蒙古文词语标记》,及内蒙古大学的《现代蒙古语语料库标注规范》。本研究今后需在大规模语料库基础上不断的完善。

关 键 词:信息处理  现代蒙古语  标注规范

On Contemporary Mongolian Segmentation Specification for Information Processing
Tonglaga. On Contemporary Mongolian Segmentation Specification for Information Processing[J]. Journal of Inner Mongolia University for Nationalities(Social Sciences), 2014, 0(4): 35-40
Authors:Tonglaga
Abstract:Though the annotation for contemporary Mongolian corpus can de conducted under the guidance of Specifications for Contemporary Mongolian Corpus A nnotation ,there still is a blank for the transliteration of non-Mongolian characters ,proper nouns and loan words and there is no further division of the tagging of word units with more than one character in the specifications .For the purpose of information processing ,the specifi-cations study the rules of merging and segmentation for Mongolian corpus tagging units on the basis of Mongo -lian linguistic features and laws .The study follows the rules in Mongolian Word and Expression Marks for In-formation Processing jointly drafted by China Electronics Standardization Institution and other organizations and Specifications for Contemporary Mongolian Corpus Annotation drawn up by Mongolian University .It is sure that improvements need to make with the development of large -scale corpus in this study .
Keywords:Information processing  contemporary Mongolian language  Specification tokenization
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号