Asymptotically Optimal Codes for Correcting Burst Deletions and Insertions in Labeled DNA Sequences

Abstract

Fluorescent labeling is a cornerstone of DNA visualization and a key enabler of random access in DNA-based data storage. However, the stochastic nature of biochemical processes, including synthesis, hybridization, and optical readout, induces burst synchronization errors within the resulting labeling sequences. To address this critical challenge, we formally introduce burst t-deletion/insertion A-labeling codes, designed to correct a single burst of t deletions or insertions in the label domain. Our contributions are threefold. itemize Fundamental limit. We establish an information-theoretic lower bound of 4 n + O(1) on the redundancy of any such code for all t 1 with t n. To the best of our knowledge, this resolves the first information-theoretic lower bound even for the single-error case \(t=1\). Explicit construction. For t 2, t n, and n 7t + 3, we propose explicit encoding and decoding algorithms, both running in O(n2) time. A novel generalized Run-Length Limited (RLL) constraint is introduced to bridge the structural mismatch between the DNA encoding domain and the label error domain. Asymptotic optimality. The proposed scheme achieves redundancy 4 n + (t-1)4 8/3 n + O(1), matching the dominant term of the lower bound up to a small O( n) overhead, rendering the construction asymptotically optimal for fixed t. itemize

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…