An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay
Abstract
We present an optimized single-precision implementation of the Sparse Approximate Matrix Multiply () [M. Challacombe and N. Bock, arXiv 1011.3534 (2010)], a fast algorithm for matrix-matrix multiplication for matrices with decay that achieves an O (n n) computational complexity with respect to matrix dimension n. We find that the max norm of the error achieved with a tolerance below 2 × 10-8 is lower than that of the single-precision SGEMM for dense quantum chemical matrices, while outperforming SGEMM with a cross-over already for small matrices (n 1000). Relative to naive implementations of using Intel's Math Kernel Library ( MKL) or AMD's Core Math Library ( ACML), our optimized version is found to be significantly faster. Detailed performance comparisons are made for quantum chemical matrices with differently structured sub-blocks. Finally, we discuss the potential of improved hardware prefetch to yield 2--3x speedups.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.