Near-optimal Algorithms for Explainable k-Medians and k-Means

Abstract

We consider the problem of explainable k-medians and k-means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian~(ICML 2020). In this problem, our goal is to find a threshold decision tree that partitions data into k clusters and minimizes the k-medians or k-means objective. The obtained clustering is easy to interpret because every decision node of a threshold tree splits data based on a single feature into two groups. We propose a new algorithm for this problem which is O( k) competitive with k-medians with 1 norm and O(k) competitive with k-means. This is an improvement over the previous guarantees of O(k) and O(k2) by Dasgupta et al (2020). We also provide a new algorithm which is O(3/2 k) competitive for k-medians with 2 norm. Our first algorithm is near-optimal: Dasgupta et al (2020) showed a lower bound of ( k) for k-medians; in this work, we prove a lower bound of (k) for k-means. We also provide a lower bound of ( k) for k-medians with 2 norm.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…