An O(1T)-error online algorithm for retrieving heavily perturbated statistical databases in the low-dimensional querying mode
Abstract
We give the first O(1T)-error online algorithm for reconstructing noisy statistical databases, where T is the number of (online) sample queries received. The algorithm, which requires only O( T) memory, aims to learn a hidden database-vector w* ∈ RD in order to accurately answer a stream of queries regarding the hidden database, which arrive in an online fashion from some unknown distribution D. We assume the distribution D is defined on the neighborhood of a low-dimensional manifold. The presented algorithm runs in O(dD)-time per query, where d is the dimensionality of the query-space. Contrary to the classical setting, there is no separate training set that is used by the algorithm to learn the database --- the stream on which the algorithm will be evaluated must also be used to learn the database-vector. The algorithm only has access to a binary oracle O that answers whether a particular linear function of the database-vector plus random noise is larger than a threshold, which is specified by the algorithm. We note that we allow for a significant O(D) amount of noise to be added while other works focused on the low noise o(D)-setting. For a stream of T queries our algorithm achieves an average error O(1T) by filtering out random noise, adapting threshold values given to the oracle based on its previous answers and, as a consequence, recovering with high precision a projection of a database-vector w* onto the manifold defining the query-space.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.