Rank, select and access in grammar-compressed strings
Abstract
Given a string S of length N on a fixed alphabet of σ symbols, a grammar compressor produces a context-free grammar G of size n that generates S and only S. In this paper we describe data structures to support the following operations on a grammar-compressed string: rankc(S,i) (return the number of occurrences of symbol c before position i in S); selectc(S,i) (return the position of the ith occurrence of c in S); and access(S,i,j) (return substring S[i,j]). For rank and select we describe data structures of size O(nσ N) bits that support the two operations in O( N) time. We propose another structure that uses O(nσ (N/n)( N)1+ε) bits and that supports the two queries in O( N/ N), where ε>0 is an arbitrary constant. To our knowledge, we are the first to study the asymptotic complexity of rank and select in the grammar-compressed setting, and we provide a hardness result showing that significantly improving the bounds we achieve would imply a major breakthrough on a hard graph-theoretical problem. Our main result for access is a method that requires O(n N) bits of space and O( N+m/σ N) time to extract m=j-i+1 consecutive symbols from S. Alternatively, we can achieve O( N/ N+m/σ N) query time using O(n (N/n)( N)1+ε) bits of space. This matches a lower bound stated by Verbin and Yu for strings where N is polynomially related to n.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.