Sharp kernel clustering algorithms and their associated Grothendieck inequalities
Abstract
In the kernel clustering problem we are given a (large) n× n symmetric positive semidefinite matrix A=(aij) with Σi=1nΣj=1n aij=0 and a (small) k× k symmetric positive semidefinite matrix B=(bij). The goal is to find a partition \S1,...,Sk\ of \1,... n\ which maximizes Σi=1kΣj=1k (Σ(p,q)∈ Si× Sjapq)bij. We design a polynomial time approximation algorithm that achieves an approximation ratio of R(B)2C(B), where R(B) and C(B) are geometric parameters that depend only on the matrix B, defined as follows: if bij = < vi, vj> is the Gram matrix representation of B for some v1,...,vk∈ k then R(B) is the minimum radius of a Euclidean ball containing the points \v1, ..., vk\. The parameter C(B) is defined as the maximum over all measurable partitions \A1,...,Ak\ of k-1 of the quantity Σi=1kΣj=1k bij< zi,zj>, where for i∈ \1,...,k\ the vector zi∈ k-1 is the Gaussian moment of Ai, i.e., zi=1(2π)(k-1)/2∫Aixe-\|x\|22/2dx. We also show that for every > 0, achieving an approximation guarantee of (1-)R(B)2C(B) is Unique Games hard.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.