A comparison study on modeling of clustered and overdispersed count data for multiple comparisons |
| |
Authors: | Jochen Kruppa Ludwig Hothorn |
| |
Affiliation: | aCharité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Berlin, Germany;bBerlin Institute of Health (BIH), Berlin, Germany;cInstitute of Biostatistics, Leibniz University Hannover, Germany, Hannover, Germany |
| |
Abstract: | Data collected in various scientific fields are count data. One way to analyze such data is to compare the individual levels of the factor treatment using multiple comparisons. However, the measured individuals are often clustered – e.g. according to litter or rearing. This must be considered when estimating the parameters by a repeated measurement model. In addition, ignoring the overdispersion to which count data is prone leads to an increase of the type one error rate. We carry out simulation studies using several different data settings and compare different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models in order to observe coverage and rejection probabilities. We generate overdispersed, clustered count data in small samples as can be observed in many biological settings. We have found that the generalized estimation equations outperform generalized linear mixed models if the variance-sandwich estimator is correctly specified. Furthermore, generalized linear mixed models show problems with the convergence rate under certain data settings, but there are model implementations with lower implications exists. Finally, we use an example of genetic data to demonstrate the application of the multiple contrast test and the problems of ignoring strong overdispersion. |
| |
Keywords: | Generalized estimation equations overdispersion simultaneous contrast tests repeated measurements generalized linear mixed models |
|
|