Bounds and Constructions of -Read Codes under the Hamming Metric

Abstract

Nanopore sequencing is a promising technology for DNA sequencing. In this paper, we investigate a specific model of the nanopore sequencer, which takes a q-ary sequence of length n as input and outputs a vector of length n+-1 referred to as an -read vector where the i-th entry is a multi-set composed of the elements located between the (i-+1)-th and i-th positions of the input sequence. Considering the presence of substitution errors in the output vector, we study -read codes under the Hamming metric. An -read (n,d)q-code is a set of q-ary sequences of length n in which the Hamming distance between -read vectors of any two distinct sequences is at least d. We first improve the result of Banerjee et al., who studied -read (n,d)q-codes with the constraint ≥ 3 and d=3. Then, we investigate the bounds and constructions of 2-read codes with a minimum distance of 3, 4, and 5, respectively. Our results indicate that when d ∈ \3,4\, the optimal redundancy of 2-read (n,d)q-codes is o(q n), while for d=5 it is q n+o(q n). Additionally, we establish an equivalence between 2-read (n,3)q-codes and classical q-ary single-insertion reconstruction codes using two noisy reads. We improve the lower bound on the redundancy of classical q-ary single-insertion reconstruction codes as well as the upper bound on the redundancy of classical q-ary single-deletion reconstruction codes when using two noisy reads. Finally, we study -read codes under the reconstruction model.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…