Approximately Minwise Independence with Twisted Tabulation
Abstract
A random hash function h is -minwise if for any set S, |S|=n, and element x∈ S, [h(x)= h(S)]=(1)/n. Minwise hash functions with low bias have widespread applications within similarity estimation. Hashing from a universe [u], the twisted tabulation hashing of Patrascu and Thorup [SODA'13] makes c=O(1) lookups in tables of size u1/c. Twisted tabulation was invented to get good concentration for hashing based sampling. Here we show that twisted tabulation yields O(1/u1/c)-minwise hashing. In the classic independence paradigm of Wegman and Carter [FOCS'79] O(1/u1/c)-minwise hashing requires ( u)-independence [Indyk SODA'99]. Patrascu and Thorup [STOC'11] had shown that simple tabulation, using same space and lookups yields O(1/n1/c)-minwise independence, which is good for large sets, but useless for small sets. Our analysis uses some of the same methods, but is much cleaner bypassing a complicated induction argument.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.