Hierarchical Recursive Precision for Accelerating Symmetric Linear Solves on MXUs

Abstract

Symmetric positive-definite system solvers based on Cholesky factorization are fundamental to many scientific applications, such as climate modeling. We present a portable, nested recursive mixed-precision solver designed for Matrix Processing Units (MXUs), including NVIDIA Tensor Cores (H200) and AMD Matrix Cores (MI300X), that assigns low-precision FP16 arithmetic to large off-diagonal blocks, while preserving high precision on diagonal blocks to ensure numerical stability. The solver is implemented in Julia, providing a high-level, hardware-agnostic interface. We demonstrate up to a 5.07x speedup relative to the diagonal-precision vendor baseline, with 100x better accuracy than pure half precision on H200, providing higher accuracy than low-precision at higher speed than high-precision. Positive performance trends are also observed on MI300X, demonstrating broad applicability across GPUs.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…