Improved constructions of secondary structure avoidance codes for DNA sequences
Abstract
In a DNA sequence, we have the celebrated Watson-Crick complement T=A, A=T, C=G, and G=C. Given an integer m 2, a secondary structure in a DNA sequence refers to the existence of two non-overlapping reverse complement consecutive subsequences of length m, denoted as x=(x1, …, xm) and y=(y1, …, ym), such that xi=ym-i+1 for 1≤ i ≤ m. The property of secondary structure avoidance (SSA) forbids a sequence to contain such reverse complement subsequences, and it is a key criterion in the design of single-stranded DNA sequences for DNA computing and storage. In this paper, we improve on a recent result of Nguyen et al., by introducing explicit constructions of secondary structure avoidance codes and analyzing the capacity for any given m. In particular, our constructions have optimal rate 1.1679bits/nt and 1.5515bits/nt when m=2 and m=3, respectively.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.