Optimal Lower Bound for Itemset Frequency Indicator Sketches

Abstract

Given a database, a common problem is to find the pairs or k-tuples of items that frequently co-occur. One specific problem is to create a small space "sketch" of the data that records which k-tuples appear in more than an ε fraction of rows of the database. We improve the lower bound of Liberty, Mitzenmacher, and Thaler [LMT14], showing that (1εd (ε d)) bits are necessary even in the case of k=2. This matches the sampling upper bound for all ε ≥ 1/d.99, and (in the case of k=2) another trivial upper bound for ε = 1/d.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…