Hardware Architecture for Inplace Compute of Burrows-Wheeler Transform in a Single Iteration

Abstract

The Burrows-Wheeler transform (BWT) is used by the bzip2 family of compressors. In this paper, we present a hardware architecture that implements an inplace algorithm to compute the BWT. Our design does not have explicit storage for the suffix array, or output array. The performance of our implementation is fixed, and does not depend on the input string content. We use a register based character buffer in a scanchain configuration, such that the BWT is computed from right to left, as characters are loaded. Loading new characters is done every six cycles, producing a new output character from the previously computed block at the same rate. Our FGPA implementation does not use block ram instances, and achieves throughput of 66, 35, 18, and 15 MB/s for block sizes of 128 B, 1 kB, 4 kB, and 8 kB. We also report results for an ASIC implementation in 65 nm CMOS that achieves 161 MB/s when using block size of 128 B.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…