Conservation of the t-digest Scale Invariant

Abstract

A t-digest is a compact data structure that allows estimates of quantiles which increased accuracy near q = 0 or q=1. This is done by clustering samples from R subject to a constraint that the number of points associated with any particular centroid is constrained so that the so-called k-size of the centroid is always 1. The k-size is defined using a scale function that maps quantile q to index k. Since the centroids are real numbers, they can be ordered and thus the quantile range of a centroid can be mapped into an interval in k whose size is the k-size of that centroid. The accuracy of quantile estimates made using a t-digest depends on the invariance of this constraint even as new data is added or t-digests are merged. This paper provides proofs of this invariance for four practically important scale functions.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…