Data Informativeness in Linear Optimization under Uncertainty

Abstract

We study the problem of determining what data is required to solve a decision-making task when only partial information about the state of the world is available. Focusing on linear programs, we introduce a decision-focused notion of data informativeness that formalizes when a data set is sufficient to recover the optimal decision. Our notion abstracts away the notion of estimators (how data is used): it depends solely on the structure of the optimization task and the uncertainty. Our main result provides a geometric characterization of data sufficiency: a data set is sufficient if and only if, together with prior knowledge, it captures all cost directions that can change the optimal solution, given the task structure and the uncertainty set. Building on our characterization, we develop a tractable algorithm to determine minimal sufficient data sets under general data collection constraints. Taken together, our work introduces a principled framework for task-aware data collection. We demonstrate the approach in two applications: selecting where to conduct field experiments to inform infrastructure design and choosing which candidates to interview in order to make an optimal hiring decision. Our results illustrate that small, carefully selected data sets often suffice to determine the optimal decisions.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…