Improving Memory Performance on Fused Architectures through Compiler and Runtime Innovations

Integrated GPUs feature shared caches and a common memory interconnect with multicore CPUs, which intensify resource contention in the memory hierarchy. This creates new challenges for data locality, task partitioning and scheduling, as well as program transformations. Most significantly, a program running on GPU warps and CPU cores may adversely affect performance and power of one another.

The objective of this work is to understand these novel implications of fused architectures by studying their effects, qualifying their causes and quantifying the impacts on performance and energy efficiency. We propose to advance the state-of-the-art by creating spheres of isolation between CPU and GPU execution via novel systems mechanisms and compiler transformations that reduce cross-boundary contention with respect to shared hardware resources. This synergy between systems and compiler techniques has the potential to significantly improve performance and power guarantees for co-scheduling program fragments on fused architectures.

Publications: Theses:
"This material is based upon work supported by the National Science Foundation under Grant No. 1525609."

"Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."