Range, Not Precision: Block-Floating-Point Half-Precision FFT and SAR Imaging on Apple Silicon
Abstract
Half precision (FP16) promises to double FFT throughput on GPUs, but the prevailing view is that its 10-bit mantissa makes it unsuitable for radar-grade signal processing. We show this framing is wrong on Apple Silicon: the binding constraint for FFT and Synthetic Aperture Radar (SAR) is not mantissa precision but the 5-bit exponent's dynamic range. We first measure that an FP16 FFT is mantissa-limited at 56--61~dB signal-to-quantization-noise ratio (SQNR) -- comfortably radar-usable -- yet a naïve FP16 SAR pipeline produces only NaN, because the conjugate--FFT--conjugate inverse transform grows magnitudes by a factor of N, and the matched-filter product (\!5×106 at N\!=\!4096) overflows FP16's 65,504 ceiling. We resolve this with a fixed-shift block-floating-point (BFP) schedule: a single 1/N scale applied before each inverse transform bounds every intermediate below 4096. A cascade follows: range-compression output becomes O(1) instead of O(N), which in turn keeps the downstream azimuth-FFT output FP16-loadable instead of overflowing at O(N2). The result is the first quality-preserving FP16 SAR pipeline: peak/integrated sidelobe ratios, target SNR, and resolution match the FP32 reference to within 0.1~dB at 42~dB end-to-end SQNR, while a radix-8 FP16 FFT reaches 306~GFLOPS -- 2.2× over the 139~GFLOPS FP32 baseline -- on a fanless Apple~M1. Finally, we measure that FP8 (E4M3/E5M2) collapses to 14--20~dB SQNR, making FP16 today's precision floor for FFT-based radar -- one that future precision-recovery methods may yet lower -- and showing that the lever for low precision here is range management, not mantissa bits.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.