Improving Memory Performance on Fused Architectures through Compiler and Runtime Innovations
- funded by: NSF
(award
abstract)
- funding level: $470,000
- duration: 08/01/2015 - 07/31/2018 (no-cost extension until 07/31/2020)
- PI: Xipeng Shen, Co-PI: Frank Mueller
Integrated GPUs feature shared caches and a common memory interconnect
with multicore CPUs, which intensify resource contention in the memory
hierarchy. This creates new challenges for data locality, task
partitioning and scheduling, as well as program transformations. Most
significantly, a program running on GPU warps and CPU cores may
adversely affect performance and power of one another.
The objective of this work is to understand these novel implications
of fused architectures by studying their effects, qualifying their
causes and quantifying the impacts on performance and energy
efficiency. We propose to advance the state-of-the-art by creating
spheres of isolation between CPU and GPU execution via novel systems
mechanisms and compiler transformations that reduce cross-boundary
contention with respect to shared hardware resources. This synergy
between systems and compiler techniques has the potential to
significantly improve performance and power guarantees for
co-scheduling program fragments on fused architectures.
Publications:
- Orchestrating Fault Prediction with Live Migration and Checkpointing by
Subhendu Behera, Lipeng Wan, Frank Mueller, Matthew Wolf, Scott Klasky in
High-Performance Parallel and Distributed Computing (HPDC), Jun
2020.
- CodeSeer: Input-dependent Code Variants Selection Via Machine Learning
by Tao Wang, Nikhil Jain, David Boehme, David Beckingsale, Frank Mueller and Todd Gamblin
in International Conference on Supercomputing (ICS), Jun 2020.
-
"Aarohi: Making Real-Time Node Failure Prediction Feasible"
by A. Das, F. Mueller, B. Rountree, in
International Parallel and Distributed Processing Symposium (IPDPS), May 2020.
-
"Uncore Power Scavenger: A
Runtime for Uncore Power Conservation on HPC Systems"
by Neha Gholkar, Frank Mueller, Barry Rountree,
in Supercomputing (SC), Nov 2019, pages.
-
"Performance characterization of a DRAM-NVM
hybrid memory architecture for HPC applications
using Intel Optane DC Persistent Memory Modules"
by Onkar Patil, Latchesar Ionkov, Jason Lee, Frank Mueller, Michael Lang,
in International Symposium on Memory Systems (MEMSYS), Sep/Oct 2019, pages.
- "Evaluating Burst
Buffer Placement in HPC Systems" by by
Harsh Khetawat, Christopher Zimmer, Frank Mueller, Scott Atchley,
Sudharshan Vazhkudai, Misbah Mubarak in Cluster, Sep 2019, Best Paper Award.
-
"FuncyTuner: Auto-tuning Scientific Applications With Per-loop
Compilation" by Tao Wang, Nikhil Jain, David
Beckingsale, David Boehme, Frank
Mueller, Todd Gamblin in International Conference
on Parallel Processing (ICPP), Aug 2019, Best Paper Candidate.
- End-to-end Resilience for HPC Applications
by A. Rezaei, H. Khetawat, O. Patil, F. Mueller, P. Hargrove, E. Roman
in International Supercomputing Conference (ISC), Jun 2019.
Gauss Award for "most outstanding research paper submitted to ISC"
-
"CloneHadoop: Process Cloning to Reduce Hadoop's
Long Tail" by Sarthak Kukreti, Frank
Mueller,
in IEEE/ACM International Conference on Big Data Computing,
Applications and Technologies (BDCAT), Dec 2018, pages, Best Paper Award.
-
"Co-Scheduling on Fused CPU-GPU Architectures with Shared Last Level
Caches" by Marvin Damschen, Frank Mueller, Joerg
Henkel in Conference on Compiler, Architecture and Synthesis on
Embedded Systems (CASES'18), Oct 2018, published in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), DOI 10.1109/TCAD.2018.2857042.
-
"Chameleon: Online Clustering of MPI Program Traces"
by A. Bahmani, F. Mueller, in
International Parallel and Distributed Processing Symposium (IPDPS), May 2018
10.1145/2597652.2597676.
- Controller-Aware Memory Coloring for Multicore Real-Time Systems
Xing Pan, Frank Mueller in
Symposium on Applied Computing (SAC), Apr 2018, DOI 10.1145/3167132.3167196.
-
"Architecting HBM as a
High Bandwidth, High Capacity, Self-Managed Last-Level Cache"
by Tyler Stocksdale, Mu-Tien Chang, Hongzhong
Zheng, Frank Mueller in Petascale Data Storage Workshop, Nov 2017.
- Exploring Use-cases for Non-Volatile
Memories in support of HPC Resilience. Onkar Patil, Saurabh
Hukerikar, Frank Mueller, Christian Engelmann. Referred
Work-in-Progress at Joint International Workshop on Parallel Data
Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS'17),
Nov 2017.
- Exploring
Use-cases for Non-Volatile Memories in support of HPC
Resilience. Onkar Patil, Saurabh Hukerikar, Frank Mueller,
Christian Engelmann. Referred Poster at ACM SRC Supercomputing (SC'17),
Nov 2017.
- TintMalloc:
Reducing Memory Access Divergence via Controller-Aware Coloring
Xing Pan, Yasaswini Gownivaripalli, Frank Mueller in
International Parallel and Distributed Processing Symposium (IPDPS), May 2016.
Theses:
"This material is based upon work supported by the National Science Foundation under Grant No. 1525609."
"Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."