Number of relevant directions in Principal Component Analysis and Wishart random matrices
Abstract
We compute analytically, for large N, the probability P(N+,N) that a N× N Wishart random matrix has N+ eigenvalues exceeding a threshold Nζ, including its large deviation tails. This probability plays a benchmark role when performing the Principal Component Analysis of a large empirical dataset. We find that P(N+,N)≈(-β N2 ζ(N+/N)), where β is the Dyson index of the ensemble and ζ() is a rate function that we compute explicitly in the full range 0≤ ≤ 1 and for any ζ. The rate function ζ() displays a quadratic behavior modulated by a logarithmic singularity close to its minimum (ζ). This is shown to be a consequence of a phase transition in an associated Coulomb gas problem. The variance (N) of the number of relevant components is also shown to grow universally (independent of ζ) as (N) (β π2)-1 N for large N.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.