Enumeration of sequences with large alphabets

Abstract

This study focuses on efficient schemes for enumerative coding of σ--ary sequences by mainly borrowing ideas from \"Oktem & Astola's Oktem99 hierarchical enumerative coding and Schalkwijk's Schalkwijk72 asymptotically optimal combinatorial code on binary sequences. By observing that the number of distinct σ--dimensional vectors having an inner sum of n, where the values in each dimension are in range [0...n] is K(σ,n) = Σi=0σ-1 n-1 σ-1-i σ i, we propose representing C vector via enumeration, and present necessary algorithms to perform this task. We prove K(σ,n) requires approximately (σ -1) (σ-1) less bits than the naive (σ-1) (n+1) representation for relatively large n, and examine the results for varying alphabet sizes experimentally. We extend the basic scheme for the enumerative coding of σ--ary sequences by introducing a new method for large alphabets. We experimentally show that the newly introduced technique is superior to the basic scheme by providing experiments on DNA sequences.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…