DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization

Abstract

We present DFT-FE 1.0, building on DFT-FE 0.6 [Comput. Phys. Commun. 246, 106853 (2020)], to conduct fast and accurate large-scale density functional theory (DFT) calculations (reaching ~ 100,000 electrons) on both many-core CPU and hybrid CPU-GPU computing architectures. This work involves improvements in the real-space formulation -- via an improved treatment of the electrostatic interactions that substantially enhances the computational efficiency -- as well high-performance computing aspects, including the GPU acceleration of all the key compute kernels in DFT-FE. We demonstrate the accuracy by comparing the ground-state energies, ionic forces and cell stresses on a wide-range of benchmark systems against those obtained from widely used DFT codes. Further, we demonstrate the numerical efficiency of our implementation, which yields 20 × CPU-GPU speed-up by using GPU acceleration on hybrid CPU-GPU nodes. Notably, owing to the parallel-scaling of the GPU implementation, we obtain wall-times of 80-140 seconds for full ground-state calculations, with stringent accuracy, on benchmark systems containing ~ 6,000-15,000 electrons.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…