PCA of probability measures: Sparse and Dense sampling regimes

Abstract

A common approach to perform PCA on probability measures is to embed them into a Hilbert space where standard functional PCA techniques apply. While convergence rates for estimating the embedding of a single measure from m samples are well understood, the literature has not addressed the setting involving multiple measures. In this paper, we study PCA in a double asymptotic regime where n probability measures are observed, each through m samples. We derive convergence rates of the form n-1/2 + m-α for the empirical covariance operator and the PCA excess risk, where α>0 depends on the chosen embedding. This characterizes the relationship between the number n of measures and the number m of samples per measure, revealing a sparse (small m) to dense (large m) transition in the convergence behavior. Moreover, we prove that the dense-regime rate is minimax optimal for the empirical covariance error. Our numerical experiments validate these theoretical rates and demonstrate that appropriate subsampling preserves PCA accuracy while reducing computational cost.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…