New asymptotic results in principal component analysis
Abstract
Let X be a mean zero Gaussian random vector in a separable Hilbert space H with covariance operator := E(X X). Let =Σr≥ 1μr Pr be the spectral decomposition of with distinct eigenvalues μ1>μ2> … and the corresponding spectral projectors P1, P2, …. Given a sample X1,…, Xn of size n of i.i.d. copies of X, the sample covariance operator is defined as n := n-1Σj=1n Xj Xj. The main goal of principal component analysis is to estimate spectral projectors P1, P2, … by their empirical counterparts P1, P2, … properly defined in terms of spectral decomposition of the sample covariance operator n. The aim of this paper is to study asymptotic distributions of important statistics related to this problem, in particular, of statistic \| Pr-Pr\|22, where \|·\|22 is the squared Hilbert--Schmidt norm. This is done in a "high-complexity" asymptotic framework in which the so called effective rank r():= tr()\|\|∞ ( tr(·) being the trace and \|·\|∞ being the operator norm) of the true covariance is becoming large simultaneously with the sample size n, but r()=o(n) as n∞. In this setting, we prove that, in the case of one-dimensional spectral projector Pr, the properly centered and normalized statistic \| Pr-Pr\|22 with data-dependent centering and normalization converges in distribution to a Cauchy type limit. The proofs of this and other related results rely on perturbation analysis and Gaussian concentration.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.