An Adaptive Self-Scheduling Loop Scheduler

Abstract

Many shared-memory parallel irregular applications, such as sparse linear algebra and graph algorithms, depend on efficient loop scheduling (LS) in a fork-join manner despite that the work per loop iteration can greatly vary depending on the application and the input. Because of its importance, many different methods, e.g., workload-aware self-scheduling, and parameters, e.g., chunk size, have been explored to achieve reasonable performance that requires expert prior knowledge about the application and input. This work proposes a new LS method that requires little to no expert knowledge to achieve speedups close to those of tuned LS methods by self-managing chunk size based on a heuristic of workload variance and using work-stealing. This method, named , is implemented into libgomp for testing. It is evaluated against OpenMP's guided, dynamic, and taskloop methods and is evaluated against BinLPT and generic work-stealing on an array of applications that includes: a synthetic benchmark, breadth-first search, K-Means, the molecular dynamics code LavaMD, and sparse matrix-vector multiplication. On 28 thread Intel system, is the only method to always be one of the top three LS methods. On average across all applications, is within 5.4% of the best method and is even able to outperform other LS methods for breadth-first search and K-Means.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…