Optimal Estimation of the Number of Communities
Abstract
In network analysis, how to estimate the number of communities K is a fundamental problem. We consider a broad setting where we allow severe degree heterogeneity and a wide range of sparsity levels, and propose Stepwise Goodness-of-Fit (StGoF) as a new approach. This is a stepwise algorithm, where for m = 1, 2, …, we alternately use a community detection step and a goodness-of-fit (GoF) step. We adapt SCORE SCORE for community detection, and propose a new GoF metric. We show that at step m, the GoF metric diverges to ∞ in probability for all m < K and converges to N(0,1) if m = K. This gives rise to a consistent estimate for K. Also, we discover the right way to define the signal-to-noise ratio (SNR) for our problem and show that consistent estimates for K do not exist if SNR 0, and StGoF is uniformly consistent for K if SNR ∞. Therefore, StGoF achieves the optimal phase transition. Similar stepwise methods (e.g., wang2017likelihood, ma2018determining) are known to face analytical challenges. We overcome the challenges by using a different stepwise scheme in StGoF and by deriving sharp results that are not available before. The key to our analysis is to show that SCORE has the Non-Splitting Property (NSP). Primarily due to a non-tractable rotation of eigenvectors dictated by the Davis-Kahan (θ) theorem, the NSP is non-trivial to prove and requires new techniques we develop.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.