RQP: Resource-Oriented Quantiser Pruning for Neural Networks on FPGAs

Abstract

High granularity quantisation (HGQ) exploits weight-level quantisation and pruning to design resource-efficient neural network accelerators, achieving an attractive trade-off between accuracy and hardware utilisation. HGQ is particularly well suited to FPGA-based edge neural network applications. Standard HGQ workflow starts from a high-precision model and progressively reduces bit width, guided by gradient-based optimisation to outline the Pareto frontier. This monotonic and irreversible pruning process is computationally intensive and can overlook the optimal subnetwork for a given resource level. We propose a resource-oriented one-shot quantiser pruning method that brings the network directly close to the target search space, and then use bidirectional beta scheduling for fine-tuning to enable a more refined scan of the Pareto frontier. Validated on the jet substructure classification, JSC, task, our method reduces the search cost by up to 20.58x compared with monotonic resource reduction in standard HGQ workflows, while achieving a competitive Pareto frontier and final network configuration.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…