Efficient Correlation Clustering Methods for Large Consensus Clustering Instances
Abstract
Consensus clustering (or clustering aggregation) inputs k partitions of a given ground set V, and seeks to create a single partition that minimizes disagreement with all input partitions. State-of-the-art algorithms for consensus clustering are based on correlation clustering methods like the popular Pivot algorithm. Unfortunately these methods have not proved to be practical for consensus clustering instances where either k or V gets large. In this paper we provide practical run time improvements for correlation clustering solvers when V is large. We reduce the time complexity of Pivot from O(|V|2 k) to O(|V| k), and its space complexity from O(|V|2) to O(|V| k) -- a significant savings since in practice k is much less than |V|. We also analyze a sampling method for these algorithms when k is large, bridging the gap between running Pivot on the full set of input partitions (an expected 1.57-approximation) and choosing a single input partition at random (an expected 2-approximation). We show experimentally that algorithms like Pivot do obtain quality clustering results in practice even on small samples of input partitions.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.