The E-MHC-Geo Transformer: Adaptive Geodesic Operations with Guaranteed Orthogonality
Abstract
We present the E-MHC-Geo Transformer, a novel architecture that unifies Manifold-Constrained Hyper-Connections (mHC), Deep Delta Learning (DDL), and the Cayley transform to obtain input-adaptive, unconditionally orthogonal residual connections. Unlike DDL, whose Householder operator is orthogonal only at β ∈ \0,2\, our Data-Dependent Cayley rotation Q(x)=(I+(β/2)A(x))-1(I-(β/2)A(x)) preserves orthogonality for all β and all inputs. To handle negation, an eigenvalue -1 case that Cayley provably excludes, we introduce the E-MHC-Geo Hybrid, which combines Cayley rotation with Householder reflection via a learned operator-selection gate X'=γ(X)Q(X)X+(1-γ(X))H2(X)X. A midpoint-collapse regularizer, 4γ(1-γ), encourages boundary gate decisions, where each selected component is orthogonal. In matched-parameter comparisons, with approximately 1.79M parameters per model and mean +/- standard deviation over 3 seeds, against four baselines including the concurrent JPmHC, E-MHC-Geo achieves the best long-horizon stability, 1.9x over JPmHC and 3.8x over GPT; the best near-π rotation loss, 4.5x over JPmHC on single-plane; strong norm preservation, with 0.001 mean deviation; and 0.96 negation cosine alignment in a diagnostic reflection probe, all with 33% fewer layers. While JPmHC's wider representation excels on pure rotation, its finite Cayley residual mixer excludes an exact λ=-1 operator and has no reflection branch, motivating our hybrid approach for accessing both connected components of O(n).
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.