Making Progress Based on False Discoveries

Roi Livni

Making Progress Based on False Discoveries

Abstract

The study of adaptive data analysis examines how many statistical queries can be answered accurately using a fixed dataset while avoiding false discoveries (statistically inaccurate answers). In this paper, we tackle a question that precedes the field of study: Is data only valuable when it provides accurate answers to statistical queries? To answer this question, we use Stochastic Convex Optimization as a case study. In this model, algorithms are considered as analysts who query an estimate of the gradient of a noisy function at each iteration and move towards its minimizer. It is known that O(1/ε2) examples can be used to minimize the objective function, but none of the existing methods depend on the accuracy of the estimated gradients along the trajectory. Therefore, we ask: How many samples are needed to minimize a noisy convex function if we require ε-accurate estimates of O(1/ε2) gradients? Or, might it be that inaccurate gradient estimates are necessary for finding the minimum of a stochastic convex function at an optimal statistical rate? We provide two partial answers to this question. First, we show that a general analyst (queries that may be maliciously chosen) requires (1/ε3) samples, ruling out the possibility of a foolproof mechanism. Second, we show that, under certain assumptions on the oracle, (1/ε2.5) samples are necessary for gradient descent to interact with the oracle. Our results are in contrast to classical bounds that show that O(1/ε2) samples can optimize the population risk to an accuracy of O(ε), but with spurious gradients.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…