Robust clustering tools based on optimal transportation

Abstract

A robust clustering method for probabilities in Wasserstein space is introduced. This new "trimmed k-barycenters" approach relies on recent results on barycenters in Wasserstein space that allow intensive computation, as required by clustering algorithms. The possibility of trimming the most discrepant distributions results in a gain in stability and robustness, highly convenient in this setting. As a remarkable application we consider a parallelized estimation setup in which each of m units processes a portion of the data, producing an estimate of k-features, encoded as k probabilities. We prove that the trimmed k-barycenter of the m× k estimates produces a consistent aggregation. We illustrate the methodology with simulated and real data examples. These include clustering populations by age distributions and analysis of cytometric data.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…