Improving Memory Performance on Fused Architectures through Compiler and Runtime Innovations

funded by: NSF (award abstract)
funding level: $470,000
duration: 08/01/2015 - 07/31/2018 (no-cost extension until 07/31/2020)
PI: Xipeng Shen, Co-PI: Frank Mueller

Integrated GPUs feature shared caches and a common memory interconnect with multicore CPUs, which intensify resource contention in the memory hierarchy. This creates new challenges for data locality, task partitioning and scheduling, as well as program transformations. Most significantly, a program running on GPU warps and CPU cores may adversely affect performance and power of one another.

The objective of this work is to understand these novel implications of fused architectures by studying their effects, qualifying their causes and quantifying the impacts on performance and energy efficiency. We propose to advance the state-of-the-art by creating spheres of isolation between CPU and GPU execution via novel systems mechanisms and compiler transformations that reduce cross-boundary contention with respect to shared hardware resources. This synergy between systems and compiler techniques has the potential to significantly improve performance and power guarantees for co-scheduling program fragments on fused architectures.

Publications:

Orchestrating Fault Prediction with Live Migration and Checkpointing by Subhendu Behera, Lipeng Wan, Frank Mueller, Matthew Wolf, Scott Klasky in High-Performance Parallel and Distributed Computing (HPDC), Jun 2020.
CodeSeer: Input-dependent Code Variants Selection Via Machine Learning by Tao Wang, Nikhil Jain, David Boehme, David Beckingsale, Frank Mueller and Todd Gamblin in International Conference on Supercomputing (ICS), Jun 2020.
"Aarohi: Making Real-Time Node Failure Prediction Feasible" by A. Das, F. Mueller, B. Rountree, in International Parallel and Distributed Processing Symposium (IPDPS), May 2020.
"Uncore Power Scavenger: A Runtime for Uncore Power Conservation on HPC Systems" by Neha Gholkar, Frank Mueller, Barry Rountree, in Supercomputing (SC), Nov 2019, pages.
"Performance characterization of a DRAM-NVM hybrid memory architecture for HPC applications using Intel Optane DC Persistent Memory Modules" by Onkar Patil, Latchesar Ionkov, Jason Lee, Frank Mueller, Michael Lang, in International Symposium on Memory Systems (MEMSYS), Sep/Oct 2019, pages.
"Evaluating Burst Buffer Placement in HPC Systems" by by Harsh Khetawat, Christopher Zimmer, Frank Mueller, Scott Atchley, Sudharshan Vazhkudai, Misbah Mubarak in Cluster, Sep 2019, Best Paper Award.
"FuncyTuner: Auto-tuning Scientific Applications With Per-loop Compilation" by Tao Wang, Nikhil Jain, David Beckingsale, David Boehme, Frank Mueller, Todd Gamblin in International Conference on Parallel Processing (ICPP), Aug 2019, Best Paper Candidate.
End-to-end Resilience for HPC Applications by A. Rezaei, H. Khetawat, O. Patil, F. Mueller, P. Hargrove, E. Roman in International Supercomputing Conference (ISC), Jun 2019. Gauss Award for "most outstanding research paper submitted to ISC"
"CloneHadoop: Process Cloning to Reduce Hadoop's Long Tail" by Sarthak Kukreti, Frank Mueller, in IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), Dec 2018, pages, Best Paper Award.
"Co-Scheduling on Fused CPU-GPU Architectures with Shared Last Level Caches" by Marvin Damschen, Frank Mueller, Joerg Henkel in Conference on Compiler, Architecture and Synthesis on Embedded Systems (CASES'18), Oct 2018, published in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), DOI 10.1109/TCAD.2018.2857042.
"Chameleon: Online Clustering of MPI Program Traces" by A. Bahmani, F. Mueller, in International Parallel and Distributed Processing Symposium (IPDPS), May 2018 10.1145/2597652.2597676.
Controller-Aware Memory Coloring for Multicore Real-Time Systems Xing Pan, Frank Mueller in Symposium on Applied Computing (SAC), Apr 2018, DOI 10.1145/3167132.3167196.
"Architecting HBM as a High Bandwidth, High Capacity, Self-Managed Last-Level Cache" by Tyler Stocksdale, Mu-Tien Chang, Hongzhong Zheng, Frank Mueller in Petascale Data Storage Workshop, Nov 2017.
Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience. Onkar Patil, Saurabh Hukerikar, Frank Mueller, Christian Engelmann. Referred Work-in-Progress at Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS'17), Nov 2017.
Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience. Onkar Patil, Saurabh Hukerikar, Frank Mueller, Christian Engelmann. Referred Poster at ACM SRC Supercomputing (SC'17), Nov 2017.
TintMalloc: Reducing Memory Access Divergence via Controller-Aware Coloring Xing Pan, Yasaswini Gownivaripalli, Frank Mueller in International Parallel and Distributed Processing Symposium (IPDPS), May 2016.

Theses:

"AdaptiveMulti-level Checkpointing OnModern High Performance Computing Systems" by Subhendu Behera, M.S. Thesis, North Carolina State University, May 2020 (last known position: Ph.D. student, NCSU)
"Providing DRAM Predictability for Real-Time Systems and Beyond" by X. Pan, Ph.D. Thesis, North Carolina State University, May 2017 (last known position: Huawei, China)

"This material is based upon work supported by the National Science Foundation under Grant No. 1525609."

"Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."