Dynamic Grammar-Compressed Self-Index in δ-Optimal Space
Abstract
A compressed self-index stores a string in compressed form while supporting locate queries without decompression. For highly repetitive strings (arising in web crawls, versioned documents, and genomic collections), static self-indexes can match the δ-optimal lower bound of (δ (n σ / (δ n)) n) bits up to constant factors, where n is the string length, σ is the alphabet size, and δ is the substring complexity. Their dynamic counterparts, however, remain scarce: every existing dynamic self-index either fails to attain δ-optimal space, pays at least ( n) time per reported occurrence during locate, or exposes the longest common prefix (LCP) of the text inside its update time. We present the dynamic RR-index, a dynamic grammar-compressed self-index built on the restricted recompression run-length straight-line program (RLSLP). To our knowledge, it is the first dynamic self-index to attain δ-optimal space. The index occupies expected O(δ (n σ / (δ n)) n) bits, answers locate queries in expected O(m + m 2 n + occ ( n / n)) time (where m is the pattern length and occ is the number of occurrences), and supports insertions and deletions of a length-m' substring in expected amortized O(m' 2 n + 3 n) time, with no dependence on the LCP. On eleven highly repetitive corpora, including a 37 GB Wikipedia dump and a 59 GB human-chromosome collection, the dynamic RR-index is up to 77× faster than the dynamic r-index on updates and up to 11× faster than other dynamic indexes on locate.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.