Improved Algorithms for Population Recovery from the Deletion Channel

Shyam Narayanan

Improved Algorithms for Population Recovery from the Deletion Channel

Abstract

The population recovery problem asks one to recover an unknown distribution over n-bit strings given access to independent noisy samples of strings drawn from the distribution. Recently, Ban et al. [BCF+19] studied the problem where the noise is induced through the deletion channel. This problem generalizes the famous trace reconstruction problem, where one wishes to learn a single string under the deletion channel. Ban et al. showed how to learn -sparse distributions over strings using (n1/2 · ( n)O()) samples. In this work, we learn the distribution using only (O(n1/3) · 2) samples, by developing a higher-moment analog of the algorithms of [DOS17, NP17], which solve trace reconstruction in (O(n1/3)) samples. We also give the first algorithm with a runtime subexponential in n, solving population recovery in (O(n1/3) · 3) samples and time. Notably, our dependence on n nearly matches the upper bound of [DOS17, NP17] when = O(1), and we reduce the dependence on from doubly to singly exponential. Therefore, we are able to learn large mixtures of strings: while Ban et al.'s algorithm can only learn a mixture of O( n/ n) strings with a subexponential number of samples, we are able to learn a mixture of no(1) strings in (n1/3 + o(1)) samples and time.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…