Information-Theoretic and Computational Limits of Correlation Detection under Graph Sampling

Abstract

Correlation analysis is a fundamental problem in statistics. In this paper, we consider the correlation detection problem between a pair of Erdos-Renyi graphs. Specifically, the problem is formulated as a hypothesis testing problem: under the null hypothesis, the two graphs are independent; under the alternative hypothesis, the two graphs are edge-correlated through a latent permutation. We focus on the scenario where only two induced subgraphs are sampled, and characterize the sample size threshold for detection. At the information-theoretic level, we establish the sample complexity rates that are optimal up to constant factors over most parameter regimes, and the remaining gap is bounded by a subpolynomial factor. On the algorithmic side, we propose polynomial-time tests based on counting trees and bounded degree motifs, and identify the regimes where they succeed. Moreover, leveraging the low-degree conjecture, we provide evidence of computational hardness that matches our achievable guarantees, showing that the proposed polynomial-time tests are rate-optimal. Together, these results reveal a statistical--computational gap in the sample size required for correlation detection. Finally, we validate the proposed algorithms on synthetic data and a real coauthor network, demonstrating strong empirical performance.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…