Optimal-Time Dictionary-Compressed Indexes

Abstract

We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result we combine several recent findings, including string attractors --- new combinatorial objects encompassing most known compressibility measures for highly repetitive texts ---, and grammars based on locally-consistent parsing. More in detail, let γ be the size of the smallest attractor for a text T of length n. The measure γ is an (asymptotic) lower bound to the size of dictionary compressors based on Lempel--Ziv, context-free grammars, and many others. The smallest known text representations in terms of attractors use space O(γ(n/γ)), and our lightest indexes work within the same asymptotic space. Let ε>0 be a suitably small constant fixed at construction time, m be the pattern length, and occ be the number of its text occurrences. Our index counts pattern occurrences in O(m+2+εn) time, and locates them in O(m+(occ+1)ε n) time. These times already outperform those of most dictionary-compressed indexes, while obtaining the least asymptotic space for any index searching within O((m+occ)\,polylog\,n) time. Further, by increasing the space to O(γ(n/γ)ε n), we reduce the locating time to the optimal O(m+occ), and within O(γ(n/γ) n) space we can also count in optimal O(m) time. No dictionary-compressed index had obtained this time before. All our indexes can be constructed in O(n) space and O(n n) expected time. As a byproduct of independent interest...

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…