Top-k Approximate Functional Dependency Discovery
Abstract
Approximate functional dependencies (AFDs) relax exact functional dependencies by tolerating a bounded degree of violation, making them suited for data quality auditing. Threshold-based discovery returns all dependencies above a user-specified cutoff, but output size is uncontrollable, the right threshold varies across datasets, and widely used measures are sensitive to LHS dimensionality. We study global top-k AFD discovery, where neither the LHS nor the RHS is fixed and the k strongest dependencies under μ+ are returned directly. The cross-attribute comparability of μ+ makes such a global ranking well-defined. We prove a Triangle Incompatibility Theorem showing that minimality, global top-k ranking, and exact-k output cannot simultaneously hold under any non-monotonic scoring function, justifying the removal of the minimality requirement. We present two algorithms: TALE-Base, which returns the exact global top-k result by exhaustive level-wise evaluation, and TALE-Opt, which reduces computation through Apriori-style candidate generation, LHS computation reuse, and two complementary pruning rules exploiting exact FD monotonicity and an optimistic upper bound on μ+. Experiments on 41 real-world datasets show that TALE-Opt achieves pruning ratios up to 99.81\% and speedups over TALE-Base up to 78.81×.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.