首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Stylometric analyses using Dirichlet process mixture models
Authors:Paramjit S Gill  Tim B Swartz
Institution:a I.K. Barber School of Arts and Sciences, University of British Columbia Okanagan, Kelowna, BC, Canada V1V 1V7
b Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC, Canada V5A 1S6
Abstract:Stylometry refers to the statistical analysis of literary style of authors based on the characteristics of expression in their writings. We propose an approach to stylometry based on a Bayesian Dirichlet process mixture model using multinomial word frequency data. The parameters of the multinomial distribution of word frequency data are the “word prints” of the author. Our approach is based on model-based clustering of the vectors of probability values of the multinomial distribution. The resultant clusters identify different writing styles that assist in author attribution for disputed works in a corpus. As a test case, the methodology is applied to the problem of authorship attribution involving the Federalist papers. Our results are consistent with previous stylometric analyses of these papers.
Keywords:Bayesian methods  Clustering  Computational linguistics  Dirichlet process priors  Disputed authorship  Federalist papers  Multinomial distribution
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号