Bounds and Constructions of Codes for Ordered Composite DNA Sequences
Abstract
This paper extends the foundational work of Dollma et al. on codes for ordered composite DNA sequences. We consider the general setting with an alphabet of size q and a resolution parameter k, moving beyond the binary (q=2) case primarily studied previously. We investigate error-correcting codes for substitution errors and deletion errors under several channel models, including (e1,…,ek)-composite error/deletion, e-composite error/deletion, and the newly introduced t-(e1,…,et)-composite error/deletion model. We first establish equivalence relations among families of composite-error correcting codes (CECCs) and among families of composite-deletion correcting codes (CDCCs). This significantly reduces the number of distinct error-parameter sets that require separate analysis. We then derive novel and general upper bounds on the sizes of CECCs using refined sphere-packing arguments and probabilistic methods. These bounds together cover all values of parameters q, k, (e1,…,ek) and e. In contrast, previous bounds were only established for q=2 and limited choices of k, (e1,…,ek) and e. For CDCCs, we generalize a known non-asymptotic upper bound for (1,0,…,0)-CDCCs and then provide a cleaner asymptotic bound. On the constructive side, for any q2, we propose (1,0,…,0)-CDCCs, 1-CDCCs and t-(1,…,1)-CDCCs with near-optimal redundancies. These codes have efficient and systematic encoders. For substitution errors, we design the first explicit encoding and decoding algorithms for the binary (1,0,…,0)-CECC constructed by Dollma et al, and extend the approach to general q. Furthermore, we give an improved construction of binary 1-CECCs, a construction of nonbinary 1-CECCs, and a construction of t-(1,…,1)-CECCs. These constructions are also systematic.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.