Small-space encoding LCE data structure with constant-time queries
Abstract
The longest common extension (LCE) problem is to preprocess a given string w of length n so that the length of the longest common prefix between suffixes of w that start at any two given positions is answered quickly. In this paper, we present a data structure of O(z τ2 + nτ) words of space which answers LCE queries in O(1) time and can be built in O(n σ) time, where 1 ≤ τ ≤ n is a parameter, z is the size of the Lempel-Ziv 77 factorization of w and σ is the alphabet size. This is an encoding data structure, i.e., it does not access the input string w when answering queries and thus w can be deleted after preprocessing. On top of this main result, we obtain further results using (variants of) our LCE data structure, which include the following: - For highly repetitive strings where the zτ2 term is dominated by nτ, we obtain a constant-time and sub-linear space LCE query data structure. - Even when the input string is not well compressible via Lempel-Ziv 77 factorization, we still can obtain a constant-time and sub-linear space LCE data structure for suitable τ and for σ ≤ 2o( n). - The time-space trade-off lower bounds for the LCE problem by Bille et al. [J. Discrete Algorithms, 25:42-50, 2014] and by Kosolobov [CoRR, abs/1611.02891, 2016] can be "surpassed" in some cases with our LCE data structure.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.