Compressed Index with Construction in Compressed Space
Abstract
Suppose that we are given a string s of length n over an alphabet \0,1,…,nO(1)\ and δ is the string complexity of s, a known compression measure. We describe an index on s with O(δnδ) space, measured in O( n)-bit machine words, which can search in s any string of length m in O(m + (occ + 1)ε n) time, where occ is the number of occurrences and ε > 0 is any fixed constant (the big-O in the space bound hides factor 1ε). Crucially, the index can be built in O(n n) expected time by one left-to-right pass on the string s in a streaming fashion with O(δnδ) construction space. The index does not use the Karp--Rabin fingerprints, and the randomization in the construction time can be eliminated by using deterministic dictionaries instead of hash tables (with a slowdown). The search time matches currently best results and the space is almost optimal (the known optimum is O(δnδα), where α = σ n and σ is the alphabet size, and it coincides with O(δnδ) when δ = O(n / α2)). This is the first index that can be constructed within such space and with such time guarantees. To avoid uninteresting marginal cases, all above bounds are stated for δ ( n).
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.