A parallel butterfly algorithm
Abstract
The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform ∫ K(x,y) g(y) dy at large numbers of target points when the kernel, K(x,y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2 Nd log N). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most O(r2 Nd/p log N + β r Nd/p + α)log p) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x,y)=exp(i (x,y)), where (x,y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms and an analogue of a 3D generalized Radon transform were respectively observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.