Testing frequency distributions in a stream
Abstract
We study how to verify specific frequency distributions when we observe a stream of N data items taken from a universe of n distinct items. We introduce the relative Fr\'echet distance to compare two frequency functions in a homogeneous manner. We consider two streaming models: insertions only and sliding windows. We present a Tester for a certain class of functions, which decides if f is close to g or if f is far from g with high probability, when f is given and g is defined by a stream. If f is uniform we show a space (n) lower bound. If f decreases fast enough, we then only use space O(2 n· n). The analysis relies on the Spacesaving algorithm MAE2005,Z22 and on sampling the stream.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.