The Most Dispersed Subset of Random Points in Rd
Abstract
Consider a population of N individuals, each having d≥ 1 different traits, and an additive measure, called dispersion, which rewards large pairwise separations between traits. The goal is to select M≤ N individuals such that their traits are as dispersed as possible. We compute analytically the full statistics (including large deviation tails) of the maximally achievable dispersion among sub-populations of size M when the traits are independent and identically distributed. Two complementary approaches are developed, one based on a mean-field theory for order statistics, and the other on the replica method from the field of disordered systems. In all dimensions d, and for rotationally symmetric distributions, the optimal subset for large populations consists of all points lying outside a d-dimensional ball whose radius is determined self-consistently. For a single trait (d=1), the statistics of the maximal dispersion can be tackled for finite N,M as well. The formulae we obtained are corroborated by numerical simulations on small instances and by heuristic algorithms that find near-optimal solutions.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.