Transformer-1: Input-Adaptive Computation for Resource-Constrained Deployment

Abstract

Addressing the resource waste caused by fixed computation paradigms in deep learning models under dynamic scenarios, this paper proposes a Transformer-1 architecture based on the principle of deep adaptivity. This architecture achieves dynamic matching between input features and computational resources by establishing a joint optimization model for complexity and computation. Our core contributions include: (1) designing a two-layer control mechanism, composed of a complexity predictor and a reinforcement learning policy network, enabling end-to-end optimization of computation paths; (2) deriving a lower bound theory for dynamic computation, proving the system's theoretical reach to optimal efficiency; and (3) proposing a layer folding technique and a CUDA Graph pre-compilation scheme, overcoming the engineering bottlenecks of dynamic architectures. In the ImageNet-1K benchmark test, our method reduces FLOPs by 42.7\% and peak memory usage by 34.1\% compared to the standard Transformer, while maintaining comparable accuracy (0.3\%). Furthermore, we conducted practical deployment on the Jetson AGX Xavier platform, verifying the effectiveness and practical value of this method in resource-constrained environments. To further validate the generality of the method, we also conducted experiments on several natural language processing tasks and achieved significant improvements in resource efficiency.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…