Space-Efficient Re-Pair Compression

Abstract

Re-Pair is an effective grammar-based compression scheme achieving strong compression rates in practice. Let n, σ, and d be the text length, alphabet size, and dictionary size of the final grammar, respectively. In their original paper, the authors show how to compute the Re-Pair grammar in expected linear time and 5n + 4σ2 + 4d + n words of working space on top of the text. In this work, we propose two algorithms improving on the space of their original solution. Our model assumes a memory word of 2 n bits and a re-writable input text composed by n such words. Our first algorithm runs in expected O(n/ε) time and uses (1+ε)n + n words of space on top of the text for any parameter 0<ε ≤ 1 chosen in advance. Our second algorithm runs in expected O(n n) time and improves the space to n + n words.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…