Efficient LZ78 factorization of grammar compressed text

Abstract

We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size n representing a text S of length N, our algorithm computes the LZ78 factorization of T in O(nN+m N) time and O(nN+m) space, where m is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the nN term in the time and space complexities becomes either nL, where L is the length of the longest LZ78 factor, or (N - α) where α ≥ 0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of S of a certain length. Since m = O(N/σ N) where σ is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ is constant, and can be more efficient when the text is compressible, i.e. when m and n are small.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…