A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization

Lei Dong

A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization

Abstract

Pre-norm Transformers with RMSNorm tolerate ternary -1,0,+1 weight quantization with surprisingly small loss (Ma et al., 2024). We give a geometric explanation via sign-magnitude decomposition of weight perturbations. In a two-layer ReLU + RMSNorm model with i.i.d. Gaussian weights, sign-flips produce π/(π-2) ≈ 2.75 times more transverse output energy than sign-preserving magnitude perturbations of equal Frobenius norm, as the flip rate p 0 (Theorem 3). The mechanism: ReLU creates a hidden-space directional asymmetry between the two perturbation types, which RMSNorm's transverse-projection Fréchet derivative selectively exposes. Sign-quantization error is itself a sign-preserving perturbation with angular alignment 2 2/π (Theorem 4); its post-ReLU radial fraction (0.365) matches the pre-ReLU value 1-2/π within 0.4\%, so ReLU is approximately transparent to ternary error. Multi-layer compounding of the 2.75× factor is not experimentally supported; the gap to real-model sign sensitivity arises from outlier features violating delocalization. For an input dimension with amplitude α, a single sign-flip produces post-ReLU energy amplified by R ≈ nα2 relative to a delocalized entry. On TinyLlama-1.1B, at linear response (p ≤ 0.5\%), count-matched NLL leverage stabilizes at 10× ≈ nE[α2], matching the per-entry theory; the all-column NLL ratio of 5.0× falls within Rcol ≤ 19 (67× PPL gap reflects metric nonlinearity). Measured outlier α at layer 12 (median 0.024, max 0.26) confirms heavy-tailed concentration. The Bussgang constant 2/π, RMSNorm geometry, and ReLU half-space structure together explain sign-magnitude asymmetry in pre-norm models, with R nα2 accounting for real-model deviations.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…