On the Analysis of a Label Propagation Algorithm for Community Detection
Abstract
This paper initiates formal analysis of a simple, distributed algorithm for community detection on networks. We analyze an algorithm that we call Max-LPA, both in terms of its convergence time and in terms of the "quality" of the communities detected. Max-LPA is an instance of a class of community detection algorithms called label propagation algorithms. As far as we know, most analysis of label propagation algorithms thus far has been empirical in nature and in this paper we seek a theoretical understanding of label propagation algorithms. In our main result, we define a clustered version of random graphs with clusters V1, V2,..., Vk where the probability p, of an edge connecting nodes within a cluster Vi is higher than p', the probability of an edge connecting nodes in distinct clusters. We show that even with fairly general restrictions on p and p' (p = (1n1/4-ε) for any ε > 0, p' = O(p2), where n is the number of nodes), Max-LPA detects the clusters V1, V2,..., Vn in just two rounds. Based on this and on empirical results, we conjecture that Max-LPA can correctly and quickly identify communities on clustered graphs even when the clusters are much sparser, i.e., with p = c nn for some c > 1.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.