Abstract: | There are two common methods for statistical inference on 2 × 2 contingency tables. One is the widely taught Pearson chi-square test, which uses the well-known χ2statistic. The chi-square test is appropriate for large sample inference, and it is equivalent to the Z-test that uses the difference between the two sample proportions for the 2 × 2 case. Another method is Fisher’s exact test, which evaluates the likelihood of each table with the same marginal totals. This article mathematically justifies that these two methods for determining extreme do not completely agree with each other. Our analysis obtains one-sided and two-sided conditions under which a disagreement in determining extreme between the two tests could occur. We also address the question whether or not their discrepancy in determining extreme would make them draw different conclusions when testing homogeneity or independence. Our examination of the two tests casts light on which test should be trusted when the two tests draw different conclusions. |