A Compressed-Gap Data-Aware Measure
Abstract
In this paper, we consider the problem of efficiently representing a set S of n items out of a universe U=\0,...,u-1\ while supporting a number of operations on it. Let G=g1...gn be the gap stream associated with S, gap its bit-size when encoded with gap-encoding, and H0(G) its empirical zero-order entropy. We prove that (1) nH0(G)∈ o(gap) if G is highly compressible, and (2) nH0(G) ≤ n(u/n) + n ≤ uH0(S). Let d be the number of distinct gap lengths between elements in S. We firstly propose a new space-efficient zero-order compressed representation of S taking n(H0(G)+1)+ O(d u) bits of space. Then, we describe a fully-indexable dictionary that supports rank and select queries in O((u/n)+ u) time while requiring asymptotically the same space as the proposed compressed representation of S.