Shifting the Sweet Spot: High-Performance Matrix-Free Method for High-Order Elasticity
Abstract
MFEM is a widely used finite-element library, but its native linear-elasticity Partial Assembly (PA) path still applies an O((p+1)6) contraction in the element operator, leaving the CPU operator-throughput sweet spot near p≈ 2 in our baseline measurements. This work closes this implementation gap for MFEM linear elasticity on affine tensor-product hexahedral meshes by integrating four well-established tensor-product PA optimizations (sum factorization, Voigt notation, macro-kernel fusion, and slice-wise loop reorganization) into MFEM's native linear-elasticity PA path. The resulting operator is evaluated in high-order GMG-PCG solves using MFEM's geometric multigrid (GMG) components. On AMD EPYC 7713, the optimized operator achieves 7--83× kernel speedup and 3.6--16.8× end-to-end speedup across p∈\1,2,4,8\. At fixed problem size, the kernel-time operator throughput peaks around p=6 and remains high at p=8, shifting the operator-throughput sweet spot to p 6. The same trend is reproduced on Huawei~Kunpeng~920 (ARMv8.2). These results are accompanied by per-stage ablation and hardware-counter characterization; the implementation will be released on GitHub.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.