Submodular Ground-Set Pruning: Monotone Tightness and a Non-Monotone Separation

Abstract

Large-scale subset selection asks for a small useful set of examples, features, sensors, seed users, or context passages from an enormous ground set. Submodular maximization is a canonical model for such diminishing-returns problems, but rapidly growing datasets make even linear-time algorithms ever costlier. We study containment pruning: first reduce the ground set to a smaller core P, then require that P contain a near-optimal feasible solution for every downstream budget up to~k. Prior work has formulated many heuristics, but the theoretical limits of this preprocessing problem are largely unknown. For monotone submodular objectives, we prove that 1-1/e is tight: greedy achieves this containment factor, and no algorithm can beat it even with a larger pruning budget. For non-monotone objectives, we give the first1/2- containment algorithms under cardinality constraints and extend the approach to knapsack constraints. This 1/2 factor exceeds the best known algorithmic ratio and the known hardness threshold for non-monotone maximization, showing that pruning can be provably easier than optimization. Empirically, pruning lets an exact IP solver run on the reduced MaxCut instance with a ≈620× speedup, and proof-of-concept experiments on LLM context selection demonstrate the utility of non-monotone submodular proxies and our proposed containment algorithms.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…