Semi-parametric analysis of multi-rater data |
| |
Authors: | Simon Rogers Mark Girolami Tamara Polajnar |
| |
Institution: | (1) Department of Chemistry, Zhejiang University, 310027 Hangzhou, China;(2) Institute of Agricultural and Life Sciences, Chongqing University, 400044 Chongqing, China; |
| |
Abstract: | Datasets that are subjectively labeled by a number of experts are becoming more common in tasks such as biological text annotation
where class definitions are necessarily somewhat subjective. Standard classification and regression models are not suited
to multiple labels and typically a pre-processing step (normally assigning the majority class) is performed. We propose Bayesian
models for classification and ordinal regression that naturally incorporate multiple expert opinions in defining predictive
distributions. The models make use of Gaussian process priors, resulting in great flexibility and particular suitability to
text based problems where the number of covariates can be far greater than the number of data instances. We show that using
all labels rather than just the majority improves performance on a recent biological dataset. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|