An implementation of hybrid parallel CUDA code for the hyperonic nuclear forces

Abstract

We present our recent effort to develop a GPGPU program to calculate 52 channels of the Nambu-Bethe-Salpeter (NBS) wave functions in order to study the baryon interactions, from nucleon-nucleon to -, from lattice QCD. We adopt CUDA programming to perform the multi-GPU execution on a hybrid parallel programming with MPI and OpenMP. Effective baryon block algorithm is briefly outlined, which calculates efficaciously a large number of NBS wave functions at a time, and three CUDA kernel programs are implemented to materialize the effective baryon block algorithm using GPUs on the single-program multiple-data (SPMD) programming model. In order to parallelize multiple GPUs, we take both two approaches by dividing the time dimension and by dividing the spatial dimensions. Performances are measured using HA-PACS supercomputer in University of Tsukuba, which includes NVIDIA M2090 and NVIDIA K20X GPUs. Strong scaling and weak scaling measured by using both M2090 and K20X GPUs are presented. We find distinct difference between the M2090 and the K20X in the sustained performance measurement of particular kernel executions which utilize the cudaStream objects.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…