Efficient average-case population recovery in the presence of insertions and deletions
Abstract
Several recent works have considered the trace reconstruction problem, in which an unknown source string x∈\0,1\n is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a trace of x. The goal is to reconstruct the original string~x from independent traces of x. While the best algorithms known for worst-case strings use (O(n1/3)) traces DOS17,NazarovPeres17, highly efficient algorithms are known PZ17,HPP18 for the average-case version, in which x is uniformly random. We consider a generalization of this average-case trace reconstruction problem, which we call average-case population recovery in the presence of insertions and deletions. In this problem, there is an unknown distribution D over s unknown source strings x1,…,xs ∈ \0,1\n, and each sample is independently generated by drawing some xi from D and returning an independent trace of xi. Building on PZ17 and HPP18, we give an efficient algorithm for this problem. For any support size s ≤ ((n1/3)), for a 1-o(1) fraction of all s-element support sets \x1,…,xs\ ⊂ \0,1\n, for every distribution D supported on \x1,…,xs\, our algorithm efficiently recovers D up to total variation distance ε with high probability, given access to independent traces of independent draws from D. The algorithm runs in time poly(n,s,1/ε) and its sample complexity is poly(s,1/ε,(1/3n)). This polynomial dependence on the support size s is in sharp contrast with the worst-case version (when x1,…,xs may be any strings in \0,1\n), in which the sample complexity of the most efficient known algorithm BCFSS19 is doubly exponential in s.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.