Towards Better Compressed Representations
Abstract
We introduce the problem of computing a parsing where each phrase is of length at most m and which minimizes the zeroth order entropy of parsing. Based on the recent theoretical results we devise a heuristic for this problem. The solution has straightforward application in succinct text representations and gives practical improvements. Moreover the proposed heuristic yields structure whose size can be bounded both by |S|Hm-1(S) and by |S|/m(H0(S) + ·s + Hm-1), where Hk(S) is the k-th order empirical entropy of S. We also consider a similar problem in which the first-order entropy is minimized.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.