New tight approximations for Fisher's exact test
Abstract
Fisher's exact test is often a preferred method to estimate the significance of statistical dependence. However, in large data sets the test is usually too worksome to be applied, especially in an exhaustive search (data mining). The traditional solution is to approximate the significance with the χ2-measure, but the accuracy is often unacceptable. As a solution, we introduce a family of upper bounds, which are fast to calculate and approximate Fisher's p-value accurately. In addition, the new approximations are not sensitive to the data size, distribution, or smallest expected counts like the χ2-based approximation. According to both theoretical and experimental analysis, the new approximations produce accurate results for all sufficiently strong dependencies. The basic form of the approximation can fail with weak dependencies, but the general form of the upper bounds can be adjusted to be arbitrarily accurate.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.