A faster subquadratic algorithm for finding outlier correlations

Abstract

We study the problem of detecting outlier pairs of strongly correlated variables among a collection of n variables with otherwise weak pairwise correlations. After normalization, this task amounts to the geometric task where we are given as input a set of n vectors with unit Euclidean norm and dimension d, and for some constants 0<τ<<1, we are asked to find all the outlier pairs of vectors whose inner product is at least in absolute value, subject to the promise that all but at most q pairs of vectors have inner product at most τ in absolute value. Improving on an algorithm of G. Valiant [FOCS 2012; J. ACM 2015], we present a randomized algorithm that for Boolean inputs (\-1,1\-valued data normalized to unit Euclidean length) runs in time \[ O(n\,\1-γ+M(γ,γ),\,M(1-γ,2γ)\+qdn2γ)\,, \] where 0<γ<1 is a constant tradeoff parameter and M(μ,) is the exponent to multiply an nμ× n matrix with an n× nμ matrix and =1/(1-τ). As corollaries we obtain randomized algorithms that run in time \[ O(n2ω3-τ+qdn2(1-τ)3-τ) \] and in time \[ O(n42+α(1-τ)+qdn2α(1-τ)2+α(1-τ))\,, \] where 2≤ω<2.38 is the exponent for square matrix multiplication and 0.3<α≤ 1 is the exponent for rectangular matrix multiplication. The notation O(·) hides polylogarithmic factors in n and d whose degree may depend on and τ. We present further corollaries for the light bulb problem and for learning sparse Boolean functions.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…