Fair Diversity Maximization with Few Representatives
Abstract
Diversity maximization problem is a well-studied problem where the goal is to find k diverse items. Fair diversity maximization aims to select a diverse subset of k items from a large dataset, while requiring that each group of items be well represented in the output. More formally, given a set of items with labels, our goal is to find k items that maximize the minimum pairwise distance in the set, while maintaining that each label is represented within some budget. In many cases, one is only interested in selecting a handful (say a constant) number of items from each group. In such scenario we show that a randomized algorithm based on padded decompositions improves the state-of-the-art approximation ratio to (m)/(3m), where m is the number of labels. The algorithms work in several stages: (i) a preprocessing pruning which ensures that points with the same label are far away from each other, (ii) a decomposition phase, where points are randomly placed in clusters such that there is a feasible solution with maximum one point per cluster and that any feasible solution will be diverse, (iii) assignment phase, where clusters are assigned to labels, and a representative point with the corresponding label is selected from each cluster. We experimentally verify the effectiveness of our algorithm on large datasets.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.