From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

Abstract

Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns -- a stark contrast to the predictable scaling laws seen in large language models (LLMs). We identify the root cause as a fundamental structural misalignment: standard Transformers assume sequential compositionality, whereas CTR data demand combinatorial reasoning over heterogeneous fields. To restore alignment, we introduce the Field-Aware Transformer (FAT). By reconstructing the standard Transformer block with field-centric parameters, FAT achieves structured expressivity, fundamentally shifting the model complexity dependence from the total vocabulary size n with the number of fields F (n F). Crucially, to decouple model capacity from field cardinality, FAT employs a Basis-Composed Hypernetwork to synthesize field-specific parameters from shared bases, further reducing parameter complexity. Theoretically, we ground this scaling behavior through a formal scaling law based on Rademacher complexity. Empirically, FAT outperforms exisiting state-of-the-art methods with up to +4.38\% AUC improvement, and delivers +2.33\% CTR and +0.66\% RPM in live production. Our work establishes that scalable recommendation arises not from size alone, but from structured expressivity -- architectural coherence with data semantics.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…