Stylometric analyses using Dirichlet process mixture models |
| |
Authors: | Paramjit S Gill Tim B Swartz |
| |
Institution: | a I.K. Barber School of Arts and Sciences, University of British Columbia Okanagan, Kelowna, BC, Canada V1V 1V7 b Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC, Canada V5A 1S6 |
| |
Abstract: | Stylometry refers to the statistical analysis of literary style of authors based on the characteristics of expression in their writings. We propose an approach to stylometry based on a Bayesian Dirichlet process mixture model using multinomial word frequency data. The parameters of the multinomial distribution of word frequency data are the “word prints” of the author. Our approach is based on model-based clustering of the vectors of probability values of the multinomial distribution. The resultant clusters identify different writing styles that assist in author attribution for disputed works in a corpus. As a test case, the methodology is applied to the problem of authorship attribution involving the Federalist papers. Our results are consistent with previous stylometric analyses of these papers. |
| |
Keywords: | Bayesian methods Clustering Computational linguistics Dirichlet process priors Disputed authorship Federalist papers Multinomial distribution |
本文献已被 ScienceDirect 等数据库收录! |