首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Synthesizing categorical datasets to enhance inference
Institution:1. University Bonn, Nußallee 21, 53115 Bonn, Germany;2. Institute for Food and Resource Economics – Bonn University, Germany;3. Institute of Crop Science and Resource Conservation – Bonn University, Germany;4. Department of Cultural and Social Anthropology – Cologne University, Germany;1. National Institute for Applied Statistics Research Australia, University of Wollongong, NSW 2522, Australia;2. National Centre for Social and Economic Modelling, University of Canberra, ACT 2601, Australia;3. Domain Data Products and Insights, Domain Group, Fairfax Media, Pyrmont, NSW 2009, Australia;1. Sensors Directorate, The United States Air Force Research Laboratory, 2241 Avionics Dr, WPAFB, OH, USA;2. Australian Centre for Field Robotics, University of Sydney, Rose Street Building, NSW, Australia;3. University of Technology Sydney (UTS), 15 Broadway, Ultimo, NSW, Australia;4. Data61 (CSIRO), Level 5, 13 Garden Street, Eveleigh, NSW, Australia
Abstract:A common data analysis setting consists of a collection of datasets of varying sizes that are all relevant to a particular scientific question, but which include different subsets of the relevant variables, presumably with some overlap. Here, we demonstrate that synthesizing cross-classified categorical datasets drawn from an incompletely cross-classified common population, where many of the sets are incomplete (i.e., one or more of the classification variables is unobserved), but at least one is completely observed is expected to reduce uncertainty about the cell probabilities in the associated multi-way contingency table as well as for derived quantities such as relative risks and odds ratios. The use of the word “expected” here is the key point. When synthesizing complete datasets from a common population, we are assured to reduce uncertainty. However, when we work with a log-linear model to explain the complete table, because this model cannot be fitted to any of the incomplete datasets, improvement is not assured. We provide technical clarification of this point as well as a series of simulation examples, motivated by an adverse birth outcomes investigation, to illustrate what can be expected under such synthesis.
Keywords:Bayesian framework  Log-linear model  Contingency tables  Incomplete data
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号