Fast Strassen-based At A Parallel Multiplication

Abstract

Matrix multiplication At A appears as intermediate operation during the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm for the At A multiplication. Our algorithm, A TA, calls classical Strassen's algorithm as sub-routine, decreasing the computational cost %(expressed in number of performed products) of the conventional At A multiplication to 27n2 7. It works for generic rectangular matrices and exploits the peculiar symmetry of the resulting product matrix for sparing memory. We used the MPI paradigm to implement A TA in parallel, and we tested its performances on a small subset of nodes of the Galileo cluster. Experiments highlight good scalability and speed-up, also thanks to minimal number of exchanged messages in the designed communication system. Parallel overhead and inherently sequential time fraction are negligible in the tested configurations.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…