Run Compressed Rank/Select for Large Alphabets
Abstract
Given a string of length n that is composed of r runs of letters from the alphabet \0,1,…,σ-1\ such that 2 σ r, we describe a data structure that, provided r n / ω(1) n, stores the string in rnσr + o(rnσr) bits and supports select and access queries in O((n/r) n) time and rank queries in O((nσ/r) n) time. We show that rn(σ-1)r - O(nr) bits are necessary for any such data structure and, thus, our solution is succinct. We also describe a data structure that uses (1 + ε)rnσr + O(r) bits, where ε > 0 is an arbitrary constant, with the same query times but without the restriction r n / ω(1) n. By simple reductions to the colored predecessor problem, we show that the query times are optimal in the important case r 2^δ n, for an arbitrary constant δ > 0. We implement our solution and compare it with the state of the art, showing that the closest competitors consume 31-46% more space.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.