More Haste, Less Waste: Lowering the Redundancy in Fully Indexable Dictionaries

Abstract

We consider the problem of representing, in a compressed format, a bit-vector S of m bits with n 1s, supporting the following operations, where b ∈ \0, 1 \: rankb(S,i) returns the number of occurrences of bit b in the prefix S[1..i]; selectb(S,i) returns the position of the ith occurrence of bit b in S. Such a data structure is called fully indexable dictionary (FID) [Raman et al.,2007], and is at least as powerful as predecessor data structures. Our focus is on space-efficient FIDs on the ram model with word size ( m) and constant time for all operations, so that the time cost is independent of the input size. Given the bitstring S to be encoded, having length m and containing n ones, the minimal amount of information that needs to be stored is B(n,m) = mn . The state of the art in building a FID for S is given in [Patrascu,2008] using B(m,n)+O(m / (( m/ t) t)) + O(m3/4) bits, to support the operations in O(t) time. Here, we propose a parametric data structure exhibiting a time/space trade-off such that, for any real constants 0 < δ ≤ 1/2, 0 < ≤ 1, and integer s > 0, it uses \[ B(n,m) + O(n1+δ + n (mns)) \] bits and performs all the operations in time O(sδ-1 + -1). The improvement is twofold: our redundancy can be lowered parametrically and, fixing s = O(1), we get a constant-time FID whose space is B(n,m) + O(m/n) bits, for sufficiently large m. This is a significant improvement compared to the previous bounds for the general case.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…