Implementing FFTs in Practice
Abstract
This review article was first published in 2008 as chapter 11 in the book "Fast Fourier Transforms," edited by C. S. Burrus, for the Connexions project at Rice University, which is sadly no longer online. It gives a high-level overview of some of the engineering considerations that arise in high-performance implementations of fast Fourier trasnforms (FFTs). It explains why optimized FFTs are very different from textbook "radix-2 Cooley-Tukey" FFT algorithms, in order to compensate for the memory hierarchy and exploit the large register sets and deep pipelines of modern CPUs. Using the FFTW library as a case study, it talks about tradeoffs in the use of recursion, generation of twiddle factors, code generation, and other algorithmic choices.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.