Improved Grammar-Based Compressed Indexes
Abstract
We introduce the first grammar-compressed representation of a sequence that supports searches in time that depends only logarithmically on the size of the grammar. Given a text T[1..u] that is represented by a (context-free) grammar of n (terminal and nonterminal) symbols and size N (measured as the sum of the lengths of the right hands of the rules), a basic grammar-based representation of T takes N n bits of space. Our representation requires 2N n + N u + ε\, n n + o(N n) bits of space, for any 0<ε 1. It can find the positions of the occ occurrences of a pattern of length m in T in O((m2/ε) ( u n) +occ n) time, and extract any substring of length of T in time O(+h(N/h)), where h is the height of the grammar tree.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.