Analyzing and Leveraging the k-Sensitivity of LZ77
Abstract
We study the sensitivity of the Lempel-Ziv 77 compression algorithm to edits, showing how modifying a string w can deteriorate or improve its compression. Our first result is a tight upper bound for k edits: ∀ w' ∈ B(w,k), we have CLZ77(w') ≤ 3 · CLZ77(w) + 4k. This result contrasts with Lempel-Ziv 78, where a single edit can significantly deteriorate compressibility, a phenomenon known as a *one-bit catastrophe*. We further refine this bound, focusing on the coefficient 3 in front of CLZ77(w), and establish a surprising trichotomy based on the compressibility of w. More precisely we prove the following bounds: - if CLZ77(w) k3/2n, the compression may increase by up to a factor of ≈ 3, - if k3/2n CLZ77(w) k1/3n2/3, this factor is at most ≈ 2, - if CLZ77(w) k1/3n2/3, the factor is at most ≈ 1. Finally, we present an -approximation algorithm to pre-edit a word w with a budget of k modifications to improve its compression. In favorable scenarios, this approach yields a total compressed size reduction by up to a factor of~3, accounting for both the LZ77 compression of the modified word and the cost of storing the edits, CLZ77(w') + k |w|.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.