Information-Theoretic Bounds for Sparse Covariance Estimation in the Vertical-Split Distributed Model

Abstract

We study the minimax estimation error for distributed covariance matrix estimation in the vertical-split (feature-split) setting, where two agents each observe different coordinates of m i.i.d. sub-Gaussian samples and communicate a limited number of bits to a central server. While Rahmani et al. [2025] established nearly tight bounds for dense (unstructured) cross-covariance matrices, we investigate whether imposing elementwise s-sparsity on the cross-covariance C21 can reduce the required communication and sample complexity. In contrast to the horizontal-split setting, where Braverman et al. [2016] showed that sparsity does not reduce communication cost for mean estimation, we prove that sparsity does help for cross-covariance estimation in the vertical split. Specifically, we establish minimax lower bounds showing that the communication budget per agent scales as Bk = Ω(σ4 dk\, s' (d1 d2/s')/2) and the sample complexity for cross-covariance estimation as m = Ω(σ4\, s' (d1 d2/s')/2), where s' = s d. For the 1-sparse case, this yields an exponential improvement from d1 d2 to (d1 d2) compared to the dense rate. Our lower bounds are established via Fano's method with an explicit sparse packing using a Varshamov--Gilbert-type argument for signed partial permutation matrices combined with the Conditional Strong Data Processing Inequality of Rahmani et al. [2025]. We show the bounds are tight with a matching achievable scheme, based on covering-net quantization and entry-wise hard thresholding, that attains the s-sparse lower bound up to polylogarithmic factors.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…