UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures
Abstract
A central difficulty in training Joint-Embedding Predictive Architectures (JEPAs) is preventing representation collapse. LeJEPA addresses this by enforcing an isotropic Gaussian target on the embeddings via Sketched Isotropic Gaussian Regularization (SIGReg). This target is in tension with the manifold hypothesis, which expects embeddings to concentrate on a low-dimensional subset of the ambient space. We propose UR-JEPA, which targets a uniformly n-rectifiable measure of local tangent dimension n at small scales, realized through a Gaussian-kernel smoothed Carleson-type square function LCGLT, with a complementary Jones β-number formulation. On Inet10, UR-JEPA(LCGLT) attains 0.9141 0.0014 for a +0.83\,pp gain over LeJEPA(LSIGReg) with 30\% lower seed standard deviation; on matched-recipe Galaxy10~SDSS, a single-seed ImageNet-100 run, and a 3-seed EuroSAT remote-sensing run, the two methods lie in the same peak-accuracy band at convergence, with UR-JEPA retaining its lower-seed-variance signature. On EuroSAT the in-domain pair is competitive at 96.0 to 96.1\% with large remote-sensing foundation-model transfer at a 25× smaller backbone. The distinction is geometric: direct visualization of the projector output distribution shows that on all four datasets UR--JEPA(LCGLT) produces a global PCA spectrum with a 4 to 5 order-of-magnitude drop at index 20 to 25 out of D = 32, while LeJEPA's spectrum is near-flat (top-to-bottom ratio at most 3.6). Per-dimension marginals are simultaneously near-Gaussian for both methods (mean Shapiro-Wilk W ∈ [0.992, 0.996]) as a Diaconis-Freedman consequence. At matched accuracy the two regularizers therefore yield structurally distinct projected representations.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.