Potential benefits of a block-space GPU approach for discrete tetrahedral domains
Abstract
The study of data-parallel domain re-organization and thread-mapping techniques are relevant topics as they can increase the efficiency of GPU computations when working on spatial discrete domains with non-box-shaped geometry. In this work we study the potential benefits of applying a succint data re-organization of a tetrahedral data-parallel domain of size O(n3) combined with an efficient block-space GPU map of the form g:N → N3. Results from the analysis suggest that in theory the combination of these two optimizations produce significant performance improvement as block-based data re-organization allows a coalesced one-to-one correspondence at local thread-space while g(λ) produces an efficient block-space spatial correspondence between groups of data and groups of threads, reducing the number of unnecessary threads from O(n3) to O(n23) where is the linear block-size and typically 3 n. From the analysis, we obtained that a block based succint data re-organization can provide up to 2× improved performance over a linear data organization while the map can be up to 6× more efficient than a bounding box approach. The results from this work can serve as a useful guide for a more efficient GPU computation on tetrahedral domains found in spin lattice, finite element and special n-body problems, among others.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.