Non-Empty Bins with Simple Tabulation Hashing

Abstract

We consider the hashing of a set X⊂eq U with |X|=m using a simple tabulation hash function h:U [n]=\0,…,n-1\ and analyse the number of non-empty bins, that is, the size of h(X). We show that the expected size of h(X) matches that with fully random hashing to within low-order terms. We also provide concentration bounds. The number of non-empty bins is a fundamental measure in the balls and bins paradigm, and it is critical in applications such as Bloom filters and Filter hashing. For example, normally Bloom filters are proportioned for a desired low false-positive probability assuming fully random hashing (see en.wikipedia.org/wiki/Bloomfilter). Our results imply that if we implement the hashing with simple tabulation, we obtain the same low false-positive probability for any possible input.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…