Limiting distribution of the sample canonical correlation coefficients of high-dimensional random vectors
Abstract
Consider two high-dimensional random vectors x∈ Rp and y∈ Rq with finite rank correlations. More precisely, suppose that x= x+A z and y= y+B z, for independent random vectors x∈ Rp, y∈ Rq and z∈ Rr with iid entries of mean 0 and variance 1, and two deterministic matrices A∈ Rp× r and B∈ Rq× r . With n iid observations of ( x, y), we study the sample canonical correlations between them. In this paper, we focus on the high-dimensional setting with a rank-r correlation. Let t1·s tr be the squares of the population canonical correlation coefficients (CCC) between x and y, and λ1·sλr be the squares of the largest r sample CCC. Under certain moment assumptions on the entries of x, y and z, we show that there exists a threshold tc∈(0, 1) such that if ti>tc, then n(λi-θi) converges in law to a centered normal distribution, where θi>λ+ is a fixed outlier location determined by ti. Our results extend the ones in [4] for Gaussian vectors. Moreover, we find that the variance of the limiting distribution of n(λi-θi) also depends on the fourth cumulants of the entries of x, y and z, a phenomenon that cannot be observed in the Gaussian case.