Federal Statistical Coordination Today: An Epilogue as Prologue |
| |
Authors: | Katherine K. Wallman |
| |
Affiliation: | Council of Professional Associations on Federal Statistics , 806 15th Street, N.W., Suite 440, Washington , DC , 20005 , USA |
| |
Abstract: | A common data mining task is the search for associations in large databases. Here we consider the search for “interestingly large” counts in a large frequency table, having millions of cells, most of which have an observed frequency of 0 or 1. We first construct a baseline or null hypothesis expected frequency for each cell, and then suggest and compare screening criteria for ranking the cell deviations of observed from expected count. A criterion based on the results of fitting an empirical Bayes model to the cell counts is recommended. An example compares these criteria for searching the FDA Spontaneous Reporting System database maintained by the Division of Pharmacovigilance and Epidemiology. In the example, each cell count is the number of reports combining one of 1,398 drugs with one of 952 adverse events (total of cell counts = 4.9 million), and the problem is to screen the drug-event combinations for possible further investigation. |
| |
Keywords: | Adverse drug reactions Association Gamma-Poisson model Mixture model Shrinkage estimate |
|
|