Missing-Data-Induced Phase Transitions in Spectral PLS for Multimodal Learning

Abstract

Partial Least Squares (PLS) learns shared structure from paired data via the top singular vectors of the empirical cross-covariance (PLS-SVD), but multimodal datasets often have missing entries in both views. We study PLS-SVD under independent entry-wise missing-completely-at-random masking in a proportional high-dimensional spiked model. After appropriate normalization, the masked cross-covariance behaves like a spiked rectangular random matrix whose effective signal strength is attenuated by ρ, where ρ is the joint entry retention probability. The replica-symmetric analysis predicts a sharp BBP-type phase transition: below a critical signal-to-noise threshold the leading singular vectors are asymptotically uninformative, while above it they achieve nontrivial alignment with the latent shared directions, with closed-form asymptotic overlap formulas. We also state a finite-rank extension as a conjecture, predicting that the same missingness-adjusted threshold applies componentwise when the latent spikes are separated. Simulations and semi-synthetic multimodal experiments agree with the predicted phase diagram and recovery curves across aspect ratios, signal strengths, and missingness levels.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…