@NTT: Algorithm-Targeted NTT hardware acceleration via Design-Time Constant Optimization
Abstract
The Number Theoretic Transform (NTT) is a critical computational bottleneck in many lattice-based postquantum cryptographic (PQC) algorithms. By leveraging the Fast Fourier Transform (FFT) algorithm, the NTT of a polynomial of degree N - 1 can be computed with a time complexity of O(N log N). Hardware implementation of NTT is generally preferred over software ones, as the latter are significantly slower due to complex memory access patterns and modular arithmetic operations. Achieving maximum throughput in hardware, however, typically demands a prohibitively large number of butterfly unit instantiations. In this work, we propose @NTT, which exploits the fact that the ring parameters in these algorithms are fixed, enabling design-time constant optimization and achieving the maximum throughput of N-point NTT per clock cycle with a compact hardware footprint. Our case study on the Dilithium NTT, implemented using the TSMC 28 nm library, operates at a clock frequency of 1.0 GHz with an area of 1.45 mm2. On FPGA, the design achieves a throughput-per-LUT that is 5.2x higher than the state-of-the-art implementation.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.