Time and Space Efficient Lempel-Ziv Factorization based on Run Length Encoding

Abstract

We propose a new approach for calculating the Lempel-Ziv factorization of a string, based on run length encoding (RLE). We present a conceptually simple off-line algorithm based on a variant of suffix arrays, as well as an on-line algorithm based on a variant of directed acyclic word graphs (DAWGs). Both algorithms run in O(N+n n) time and O(n) extra space, where N is the size of the string, n≤ N is the number of RLE factors. The time dependency on N is only in the conversion of the string to RLE, which can be computed very efficiently in O(N) time and O(1) extra space (excluding the output). When the string is compressible via RLE, i.e., n = o(N), our algorithms are, to the best of our knowledge, the first algorithms which require only o(N) extra space while running in o(N N) time.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…