Mixed precision solvers with half-precision floating point numbers for Lattice QCD on A64FX processor

Keigo Nitadori

doi:10.1145/3784828.3785398

Mixed precision solvers with half-precision floating point numbers for Lattice QCD on A64FX processor

Abstract

We investigate the use of half-precision floating-point numbers (FP16) in mixed-precision linear solvers for lattice QCD simulations. Since the emergence of GPUs for general-purpose, mixed-precision algorithms that combine single-precision (FP32) with double-precision (FP64) arithmetics have become widely used in this field and others. While FP32-based methods are now well established, we examine the practicality of using FP16. In this work, we introduce rescaling steps in both the outer iterative refinement step and the inner BiCGStab solver to avoid numerical instability. In our experiments with a simple Wilson kernel, the solver shows improved stability, and the additional iteration count compared to the FP64 version remains within 20\%, indicating that the FP16 version is practical for use. We believe that the proposed rescaling methods can also benefit other mixed precision preconditioners in avoiding underflows.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…