In-database connected component analysis
Abstract
We describe a Big Data-practical, SQL-implementable algorithm for efficiently determining connected components for graph data stored in a Massively Parallel Processing (MPP) relational database. The algorithm described is a linear-space, randomised algorithm, always terminating with the correct answer but subject to a stochastic running time, such that for any ε>0 and any input graph G= V, E the algorithm terminates after O( |V|) SQL queries with probability of at least 1-ε, which we show empirically to translate to a quasi-linear runtime in practice.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.