HiAP: A Multi-Granular Stochastic Auto-Pruning Framework for Vision Transformers
Abstract
Vision Transformers require significant computational resources and memory bandwidth, severely limiting their deployment on resource-constraint hardware. Most structured pruning methods reduce theoretical cost effectively, yet they typically operate at a single structural granularity and depend on multi-stage pipelines with importance ranking, auxiliary solvers or post-hoc magnitude thresholding, followed by a separate fine-tuning phase to recover accuracy. We propose Hierarchical Auto-Pruning (HiAP), which casts ViT pruning as a single budget-aware learning problem and jointly allocates sparsity across four granularities in one end-to-end phase. HiAP introduces stochastic Gumbel-Sigmoid gates at macro level (attention heads and FFN blocks) and micro level (intra-head dimensions and FFN neurons), and optimizes them against the task loss together with an analytical MAC cost term. The budget coefficient steers the network to a target compute level while the gates gradually harden into a dense, smaller sub-network at convergence. It does not require importance heuristics, ranking metrics, auxillary solvers or secondary fine-tuning. On ImageNet with DeiT small, HiAP automatically discovers hetergenous architectures, pruning depths, heads, and width by different amount across layers, and reaches competitive accuracies against substantially more complex pruning pipelines at comparable compute from a single training run.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.