Improved Algorithms for Clustering with Noisy Distance Oracles

Abstract

Bateni et al. has recently introduced the weak-strong distance oracle model to study clustering problems in settings with limited distance information. Given query access to the strong-oracle and weak-oracle in the weak-strong oracle model, the authors design approximation algorithms for k-means and k-center clustering problems. In this work, we design algorithms with improved guarantees for k-means and k-center clustering problems in the weak-strong oracle model. The k-means++ algorithm is routinely used to solve k-means in settings where complete distance information is available. One of the main contributions of this work is to show that k-means++ algorithm can be adapted to work in the weak-strong oracle model using only a small number of strong-oracle queries, which is the critical resource in this model. In particular, our k-means++ based algorithm gives a constant approximation for k-means and uses O(k2 2n) strong-oracle queries. This improves on the algorithm of Bateni et al. that uses O(k2 4n 2 n) strong-oracle queries for a constant factor approximation of k-means. For the k-center problem, we give a simple ball-carving based 6(1 + ε)-approximation algorithm that uses O(k3 2n nε) strong-oracle queries. This is an improvement over the 14(1 + ε)-approximation algorithm of Bateni et al. that uses O(k2 4n 2nε) strong-oracle queries. To show the effectiveness of our algorithms, we perform empirical evaluations on real-world datasets and show that our algorithms significantly outperform the algorithms of Bateni et al.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…