2B or Not 2B: A Tale of Three Algorithms for Streaming: Covariance Estimation after Welford and Chan-Golub-LeVeque

Abstract

We place three algorithms for computing the unbiased sample covariance matrix in streaming and distributed settings on a common algebraic, numerical, and statistical foundation. The Gram algorithm, derived from the variance reformulation, maintains the running cross-product matrix Gt = Σi=1t xi xi and the column-sum vector st = Σi=1t xi, yielding the unbiased covariance estimator St = (t-1)-1(Gt - t-1st st) in O(p2) time per update. The Welford algorithm propagates a running mean mt and outer-product corrections Mt, with updates mt = mt-1 + (xt - mt-1)/t and Mt = Mt-1 + (xt - mt-1)(xt - mt), achieving the same asymptotic cost with improved numerical stability under large data shifts. The Chan-Golub-LeVeque algorithm supports block-parallel merging through the exact identity M = MA + MB + nA nBnA+nB(mB - mA)(mB - mA), making it the natural choice for distributed and map-reduce architectures. All three algorithms produce the same estimator St = Mt/(t-1) in exact arithmetic, although their finite-precision behavior differs markedly. Beyond runtime and numerical comparisons, we introduce a conformal prediction framework for streaming covariance estimation that yields finite-sample, distribution-free confidence sets Ct,jk for each entry St,jk of the covariance matrix at any step t of the data stream. Experiments confirm that the Gram algorithm is fastest for batch computation, Welford is uniquely robust to catastrophic cancellation under large mean shifts, CGL is optimal for distributed settings, and conformal intervals achieve the nominal coverage level across all three algorithms.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…