An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay

Matt Challacombe

An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay

Abstract

We present an optimized single-precision implementation of the Sparse Approximate Matrix Multiply () [M. Challacombe and N. Bock, arXiv 1011.3534 (2010)], a fast algorithm for matrix-matrix multiplication for matrices with decay that achieves an O (n n) computational complexity with respect to matrix dimension n. We find that the max norm of the error achieved with a tolerance below 2 × 10-8 is lower than that of the single-precision SGEMM for dense quantum chemical matrices, while outperforming SGEMM with a cross-over already for small matrices (n 1000). Relative to naive implementations of using Intel's Math Kernel Library ( MKL) or AMD's Core Math Library ( ACML), our optimized version is found to be significantly faster. Detailed performance comparisons are made for quantum chemical matrices with differently structured sub-blocks. Finally, we discuss the potential of improved hardware prefetch to yield 2--3x speedups.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…