Efficient average-case population recovery in the presence of insertions and deletions

Abstract

Several recent works have considered the trace reconstruction problem, in which an unknown source string x∈\0,1\n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a trace of x. The goal is to reconstruct the original string~x from independent traces of x. While the best algorithms known for worst-case strings use (O(n1/3)) traces DOS17,NazarovPeres17, highly efficient algorithms are known PZ17,HPP18 for the average-case version, in which x is uniformly random. We consider a generalization of this average-case trace reconstruction problem, which we call average-case population recovery in the presence of insertions and deletions. In this problem, there is an unknown distribution D over s unknown source strings x1,…,xs ∈ \0,1\n, and each sample is independently generated by drawing some xi from D and returning an independent trace of xi. Building on PZ17 and HPP18, we give an efficient algorithm for this problem. For any support size s ≤ ((n1/3)), for a 1-o(1) fraction of all s-element support sets \x1,…,xs\ ⊂ \0,1\n, for every distribution D supported on \x1,…,xs\, our algorithm efficiently recovers D up to total variation distance ε with high probability, given access to independent traces of independent draws from D. The algorithm runs in time poly(n,s,1/ε) and its sample complexity is poly(s,1/ε,(1/3n)). This polynomial dependence on the support size s is in sharp contrast with the worst-case version (when x1,…,xs may be any strings in \0,1\n), in which the sample complexity of the most efficient known algorithm BCFSS19 is doubly exponential in s.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…