Hobbes: OS and Runtime Support for Application Composition
- funded by: SNL
- funding level: $300,000
- duration: 10/24/2013 - 10/23/2016 (no-cost extension until 09/30/2017)
This project under the umbrella of
Hobbes intends to deliver
an operating system and runtime system (OS/R) environment for
extreme-scale scientific computing.
We will develop the necessary OS/R interfaces and lowlevel system
services to support isolation and sharing functionality for designing
and implementing applications as well as performance and correctness
We propose a lightweight OS/R system
with the flexibility to custom build runtimes for any particular
purpose. Each component executes in its own "enclave" with a
specialized runtime and isolation properties. A global runtime system
provides the software required to compose applications out of a
collection of enclaves, join them through secure and low-latency
communication, and schedule them to avoid contention and maximize
The primary deliverable of this project is a full OS/R stack based on
the Kitten operating system and Palacios virtual machine monitor that
can be delivered to vendors for further enhancement and
"FuncyTuner: Auto-tuning Scientific Applications With Per-loop
Compilation" by Tao Wang, Nikhil Jain, David
Beckingsale, David Boehme, Frank
Mueller, Todd Gamblin in International Conference
on Parallel Processing (ICPP), Aug 2019.
- Mini-Ckpts: Surviving
OS Failures in Persistent Memory
by David Fiala, Frank Mueller, Kurt Ferreira,
in International Conference on Supercomputing (ICS), Jun 2016.
- A Numerical Soft Fault
Model for Iterative Linear Solvers by
James Elliott, Mark Hoemmen, Frank Mueller in
High-Performance Parallel and Distributed Computing (HPDC), Jun
2015, pages 271-274.
- Evaluating the Impact of SDC on the GMRES Iterative Solver by James Elliott, Mark Hoemmen, Frank Mueller in
International Parallel and Distributed Processing Symposium
(IPDPS), May 2014, pages 1193-1202.
"Exploiting Data Representation for Fault Tolerance"
by James Elliott, Mark Hoemmen, and Frank
Mueller, Workshop on Latest Advances in Scalable Algorithms for
Large-Scale Systems (ScalA), Nov 2014.
"Resilience in Numerical Methods: A Position on
Fault Models and Methodologies" by J. Elliott, M. Hoemmen, F. Mueller", invited talk at SIAM Conference on
Computational Science and Engineering, Feb 2014.
"Tolerating Silent Data Corruption in Opaque Preconditioners" by J. Elliott, M. Hoemmen, F. Mueller", Computing Research Repository, Feb 2014.
"Model Driven Analysis of Faulty IEEE-754 Scalars" by James Elliott, Mark Hoemmen, Frank Mueller
in TR 2015-9, Dept. of Computer Science, North Carolina State
University, Nov 2015.
"Skeptical Programming and Selective Reliability" by
James Elliott, Mark Hoemmen, Frank Mueller, refereed poster at Supercomputing, Nov 2014.