Auto-Tuned Per-Loop Compilation

funded by: LLNL
funding level: $50,000 + $21,202 (phase 2)
duration: 01/24/2018 - 01/31/2019, phase 2 10/04/2018 - 8/30/2019

HPC applications require careful tuning to exploit close to peak performance on cutting-edge hardware platforms. This work hypothesizes that traditional per-module optimizations fall short of fully exploiting a compiler's capabilities, even when interprocedural optimization complement local and global ones.

This project proposes to investigate the viability to separately compile major loops in an auto-tuning effort. Such an ensemble of loop units, when linked together, has the potential to improve not only single-loop but also overall application performance, thereby edging closer to peak performance for a given platform.

Publications:

"BarrierFinder: Recognizing Ad Hoc Barriers" by Tao Wang, Xiao Yu, Zhengyi Qiu, Guoliang Jin, Frank Mueller in Empirical Software Engineering (EMSE), No. 9862, accepted Jul 2020.
CodeSeer: Input-dependent Code Variants Selection Via Machine Learning by Tao Wang, Nikhil Jain, David Boehme, David Beckingsale, Frank Mueller and Todd Gamblin in International Conference on Supercomputing (ICS), Jun 2020.
"BarrierFinder: Recognizing Ad Hoc Barriers" by Tao Wang, Xiao Yu, Zhengyi Qiu, Guoliang Jin, Frank Mueller in International Conference on Software Maintenance and Evolution (ICSME), Sep/Oct 2019.
"FuncyTuner: Auto-tuning Scientific Applications With Per-loop Compilation" by Tao Wang, Nikhil Jain, David Beckingsale, David Boehme, Frank Mueller, Todd Gamblin in International Conference on Parallel Processing (ICPP), Aug 2019, Best Paper Candidate.

Theses:

"Compiler-based Auto-tuning and Synchronization Validation for HPC Applications" by T. Wang, Ph.D. Thesis, North Carolina State University, Dec 2019 (last known position: TBD)