Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection

Abstract

Selecting a small, diverse, high-quality subset from a massive pool of candidates is a recurring primitive in modern machine learning -- data curation and coreset selection for training and fine-tuning large models, active-learning batch acquisition, prompt and exemplar selection for in-context learning, retrieval diversification, and experimental design. Determinantal Point Processes ( s) give a principled, well-calibrated notion of diversity for this task, but their MAP objective -- pick a size-k subset S maximizing (LS) -- is NP-hard, and the standard greedy and sampling algorithms scale superlinearly in the ground-set size n. This cost is prohibitive precisely in the data-centric regime where diversity matters most, where n ranges over millions to billions of candidate examples, features, or embeddings. We recast -MAP as a continuous optimization problem over the Stiefel manifold, and show that its first-order optimality conditions form a Nonlinear Eigenvalue Problem with eigenvector dependency () of a previously unstudied form. This \ admits a self-consistent field () iteration with a spectral-gap-based local contraction guarantee, giving a principled iterative solver where the diversity objective drives an eigenvector-dependent operator. The resulting algorithm, , requires only matrix-vector products with the kernel and runs in time O\!((ndk+nk2)\,t) for a small number of iterations t, scaling near-linearly in n and integrating directly with low-rank and feature-map kernels common in ML. This paper focuses on the relaxation, solver, and scaling analysis; full real-data benchmarking is left to a planned empirical study.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…