A Locality-Aware Bruck Allgather
Abstract
Collective algorithms are an essential part of MPI, allowing application programmers to utilize underlying optimizations of common distributed operations. The MPIAllgather gathers data, which is originally distributed across all processes, so that all data is available to each process. For small data sizes, the Bruck algorithm is commonly implemented to minimize the maximum number of messages communicated by any process. However, the cost of each step of communication is dependent upon the relative locations of source and destination processes, with non-local messages, such as inter-node, significantly more costly than local messages, such as intra-node. This paper optimizes the Bruck algorithm with locality-awareness, minimizing the number and size of non-local messages to improve performance and scalability of the allgather operation
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.