Assessing the bias in samples of large online networks |
| |
Affiliation: | 1. Annenberg School for Communication, University of Pennsylvania, United States;2. Oxford Internet Institute, University of Oxford, United Kingdom;3. Institute for Biocomputation and Physics of Complex Systems, University of Zaragoza, Spain;4. Qatar Computing Research Institute, Qatar Foundation, Qatar;5. Department of Theoretical Physics, Faculty of Sciences, University of Zaragoza, Zaragoza 50009, Spain;6. Complex Networks and Systems Lagrange Lab, Institute for Scientific Interchange, Turin, Italy;1. Department of Geographical Sciences, University of Maryland, College Park, MD 20740, United States;2. University of Tennessee, Knoxville, TN 37996, United States;3. Science Systems and Applications Inc., NASA Goddard Space Flight Center, Code 618, Greenbelt, MD 20771, United States;1. ARCES and DEI, University of Bologna, Viale del Risorgimento 2, 40136 Bologna, Italy;2. Department of Applied Physics, School of Engineering Sciences, KTH Royal Institute of Technology, Electrum 229, SE-16440 Kista, Sweden;1. Université Côte d''Azur/CNRS, France;2. INRIA Sophia-Antipolis, France |
| |
Abstract: | We consider the sampling bias introduced in the study of online networks when collecting data through publicly available APIs (application programming interfaces). We assess differences between three samples of Twitter activity; the empirical context is given by political protests taking place in May 2012. We track online communication around these protests for the period of one month, and reconstruct the network of mentions and re-tweets according to the search and the streaming APIs, and to different filtering parameters. We find that smaller samples do not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions, partly because of the higher influence of snowballing in identifying relevant nodes. We discuss the implications of this bias for the study of diffusion dynamics and political communication through social media, and advocate the need for more uniform sampling procedures to study online communication. |
| |
Keywords: | Social media Twitter Political communication Social protests Measurement error Graph comparison |
本文献已被 ScienceDirect 等数据库收录! |
|