A Monte Carlo comparison of categorical tests of independence
Abstract
The X2 and G2 tests are the most frequently applied tests for testing the independence of two categorical variables. However, no one, to the best of our knowledge has compared them, extensively, and ultimately answer the question of which to use and when. Further, their applicability in cases with zero frequencies has been debated and (non parametric) permutation tests are suggested. In this work we perform extensive Monte Carlo simulation studies attempting to answer both aforementioned points. As expected, in large sample sized cases (>1,000) the X2 and G2 are indistinguishable. In the small sample sized cases (≤ 1,000) though, we provide strong evidence supporting the use of the X2 test regardless of zero frequencies for the case of unconditional independence. Also, we suggest the use of the permutation based G2 test for testing conditional independence, at the cost of being computationally more expensive. The G2 test exhibited inferior performance and its use should be limited.