Limit results for distributed estimation of invariant subspaces in multiple networks inference and PCA
Abstract
Several statistical problems, such as multiple heterogeneous graph analysis, distributed PCA, integrative data analysis, and simultaneous dimension reduction of images, can involve a collection of m matrices whose leading subspaces U(i) consist of a shared subspace Uc and individual subspaces Us(i). We consider a distributed estimation procedure that first obtains U(i) as the leading singular vectors for each observed noisy matrix, then computes the leading left singular vectors of the concatenated matrix [ U(1)| U(2)|…| U(m)] as Uc, and finally computes the leading singular vectors of the projection of each U(i) onto the orthogonal complement of Uc as Us(i). In this paper, we provide a framework for deriving limit results for such distributed estimation procedures, including expansions of estimation errors in both common and individual subspaces and their asymptotically normal approximations. We apply this framework specifically to (1) parameter estimation for multiple heterogeneous random graphs with shared subspaces, and (2) distributed PCA for independent sub-Gaussian random vectors with spiked covariance structures. Leveraging these results, we also consider a two-sample test for the null hypothesis that a pair of random graphs have the same edge probabilities, and present a test statistic whose limiting distribution converges to a central (resp., non-central) 2 distribution under the null (resp., local alternative) hypothesis.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.