Auto-Tuned Per-Loop Compilation

HPC applications require careful tuning to exploit close to peak performance on cutting-edge hardware platforms. This work hypothesizes that traditional per-module optimizations fall short of fully exploiting a compiler's capabilities, even when interprocedural optimization complement local and global ones.

This project proposes to investigate the viability to separately compile major loops in an auto-tuning effort. Such an ensemble of loop units, when linked together, has the potential to improve not only single-loop but also overall application performance, thereby edging closer to peak performance for a given platform.

Publications: Theses: