Random Access in DNA Storage: Algorithms, Constructions, and Bounds
Abstract
As DNA data storage moves closer to practical deployment, minimizing sequencing coverage depth is essential to reduce both operational costs and retrieval latency. This paper addresses the recently studied Random Access Problem, which evaluates the expected number of read samples required to recover a specific information strand from n encoded strands. We propose a novel algorithm to compute the exact expected number of reads, achieving a computational complexity of O(n) for fixed field size q and information length k. Furthermore, we derive explicit formulas for the average and maximum expected number of reads, enabling an efficient search for optimal generator matrices under small parameters. Beyond theoretical analysis, we present new code constructions that improve the best-known upper bound from 0.8815k to 0.8811k for k=3, and achieve an upper bound of 0.8629k for k=4 for sufficiently large q. We also establish a tighter theoretical lower bound on the expected number of reads that improves upon state-of-the-art bounds. In particular, this bound establishes the optimality of the simple parity code for the case of n=k+1 across any alphabet q.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.