Hobbes: OS and Runtime Support for Application Composition
- funded by: SNL
- funding level: $300,000
- duration: 10/24/2013 - 10/23/2016 (no-cost extension until 09/30/2017)
This project under the umbrella of
Hobbes intends to deliver
an operating system and runtime system (OS/R) environment for
extreme-scale scientific computing.
We will develop the necessary OS/R interfaces and lowlevel system
services to support isolation and sharing functionality for designing
and implementing applications as well as performance and correctness
tools.
We propose a lightweight OS/R system
with the flexibility to custom build runtimes for any particular
purpose. Each component executes in its own "enclave" with a
specialized runtime and isolation properties. A global runtime system
provides the software required to compose applications out of a
collection of enclaves, join them through secure and low-latency
communication, and schedule them to avoid contention and maximize
resource utilization.
The primary deliverable of this project is a full OS/R stack based on
the Kitten operating system and Palacios virtual machine monitor that
can be delivered to vendors for further enhancement and
optimization.
Publications:
-
"FuncyTuner: Auto-tuning Scientific Applications With Per-loop
Compilation" by Tao Wang, Nikhil Jain, David
Beckingsale, David Boehme, Frank
Mueller, Todd Gamblin in International Conference
on Parallel Processing (ICPP), Aug 2019.
- Mini-Ckpts: Surviving
OS Failures in Persistent Memory
by David Fiala, Frank Mueller, Kurt Ferreira,
Christian Engelmann
in International Conference on Supercomputing (ICS), Jun 2016.
- A Numerical Soft Fault
Model for Iterative Linear Solvers by
James Elliott, Mark Hoemmen, Frank Mueller in
High-Performance Parallel and Distributed Computing (HPDC), Jun
2015, pages 271-274.
- Evaluating the Impact of SDC on the GMRES Iterative Solver by James Elliott, Mark Hoemmen, Frank Mueller in
International Parallel and Distributed Processing Symposium
(IPDPS), May 2014, pages 1193-1202.
-
"Exploiting Data Representation for Fault Tolerance"
by James Elliott, Mark Hoemmen, and Frank
Mueller, Workshop on Latest Advances in Scalable Algorithms for
Large-Scale Systems (ScalA), Nov 2014.
-
"Resilience in Numerical Methods: A Position on
Fault Models and Methodologies" by J. Elliott, M. Hoemmen, F. Mueller", invited talk at SIAM Conference on
Computational Science and Engineering, Feb 2014.
-
"Tolerating Silent Data Corruption in Opaque Preconditioners" by J. Elliott, M. Hoemmen, F. Mueller", Computing Research Repository, Feb 2014.
-
"Model Driven Analysis of Faulty IEEE-754 Scalars" by James Elliott, Mark Hoemmen, Frank Mueller
in TR 2015-9, Dept. of Computer Science, North Carolina State
University, Nov 2015.
-
"Skeptical Programming and Selective Reliability" by
James Elliott, Mark Hoemmen, Frank Mueller, refereed poster at Supercomputing, Nov 2014.
Theses: