Effective MPI: User-defined Datatypes and Cartesian Communicators for Zero-copy All-to-all Communication in Multidimensional Tori

Abstract

We present and show how to implement a non-trivial all-to-all communication algorithm for arbitrary d-dimensional tori effectively in MPI. Given a factorization of the number of processes p into d factors that can be mapped onto a d-dimensional torus, we first utilize a Cartesian communicator to split a given p-process MPI communicator into, for each MPI process, d smaller communicators spanning each of the dimensions of the torus to which the process belongs, and cache these communicators in order to avoid expensive splitting at each all-to-all operation. The all-to-all operation itself is decomposed into a sequence of d MPIAlltoall operations on the dimension-wise communicators. The non-trivial data rearrangement before and after each MPIAlltoall call is implicit only and effected by MPI derived datatypes. This makes the implementation of the algorithm formally zero-copy, meaning that no explicit process-local reordering of data blocks ever has to be performed. In order to achieve this, the algorithm employs a double-buffering scheme with modest temporary buffer requirements. By choosing the factorization of p and selecting appropriate implementations for the component MPIAlltoall operations, the presented implementation gives ample opportunities for algorithm tuning and adaptation to the particular high-performance system. A few, select experimental results show competitive performance with native MPIAlltoall implementations and illustrate problems that common MPIAlltoall implementations may have.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…