Sharp Information-Theoretic Thresholds for Shuffled Linear Regression
Abstract
This paper studies the problem of shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we consider the model y = * X β* + w, where X is an n × d standard Gaussian design matrix, w is Gaussian noise with entrywise variance σ2, * is an unknown n × n permutation matrix, and β* is the regression coefficient, also unknown. Previous work has shown that, in the large n-limit, the minimal signal-to-noise ratio (SNR), β* 2/σ2, for recovering the unknown permutation exactly with high probability is between n2 and nC for some absolute constant C and the sharp threshold is unknown even for d=1. We show that this threshold is precisely SNR = n4 for exact recovery throughout the sublinear regime d=o(n). As a by-product of our analysis, we also determine the sharp threshold of almost exact recovery to be SNR = n2, where all but a vanishing fraction of the permutation is reconstructed.