GPU-accelerated finite-temperature Lanczos method for spin Hamiltonians

Abstract

We present a GPU implementation of the finite-temperature Lanczos method (FTLM) for Heisenberg spin Hamiltonians that targets workstation hardware rather than distributed-memory clusters. The Hamiltonian action is evaluated matrix-free in a row-wise gather formulation. We introduce and compare two state-to-index strategies: a compressed lookup table (CLT), which reduces lookup memory by a factor of 16 relative to a full table while retaining a fixed, branch-light access pattern, and a GPU-adapted combinatorial-ranking scheme that removes the lookup table altogether. Numerical tests against FP64 CPU references show that FP32 GPU arithmetic changes heat capacities and magnetic susceptibilities by amounts several orders of magnitude below the stochastic uncertainty of the FTLM trace estimator at typical sample sizes. Benchmarks show speedups of up to about one order of magnitude over optimized multicore CPU calculations and enable Hilbert-space sectors of dimension ~108 on a single workstation GPU. The MATLAB/CUDA implementation, including example input files and benchmark scripts, is openly available at https://github.com/ghasdeke/ftlm-gpu (archived at DOI: 10.5281/zenodo.20378647) under the Apache-2.0 license.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…