A Parallel Two-Pass MDL Context Tree Algorithm for Universal Source Coding
Abstract
We present a novel lossless universal source coding algorithm that uses parallel computational units to increase the throughput. The length-N input sequence is partitioned into B blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of B, but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) source underlying the entire input, and then encode each of the B blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i.e., its computational complexity is O(N/B). Its redundancy is approximately B(N/B) bits above Rissanen's lower bound on universal coding performance, with respect to any tree source whose maximal depth is at most (N/B).
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.