ARC: A Root Cluster for Research into Scalable Computer Systems

funded by: NSF (award abstract)
funding level: $549,999
duration: 03/01/2010 - 02/28/2013
PIs/co-PIs: Frank Mueller, Vincent Freeh, Helen Gu, Xuxian Jiang, Xiaosong Ma

Scalability is one of the key challenges to computing with hundreds if not thousands of processor. Yet, testing software at scale with hundreds of processing cores is impossible if system software with privileged access rights needs to be modified. The inability to change system software at will in large-scale computing installations thus impedes progress in system software.

This project creates a mid-size computational infrastructure, called ARC (A Root Cluster), that directly supports research into scalability for system-level software solutions. ARC empowers users temporarily with administrator (root) rights and allows them to replace arbitrary components of the software stack. Such replacements range from entire operating systems over drivers, kernel modules to runtime libraries, middleware and system tools.

ARC ultimately enables a multitude of systems research directions to be assessed under scalability that could otherwise not be conducted. Through ARC, methodologies for scalability of experimental system software in various institutional projects and beyond can be explored and systematically improved. ARC is positioned to benefit the software systems community and indirectly science in general by this assessment of system software requirements at scale.

Links:

ARC infrastructure

Publications:

"Aarohi: Making Real-Time Node Failure Prediction Feasible" by A. Das, F. Mueller, R. Rountree, in International Parallel and Distributed Processing Symposium (IPDPS), May 2020.
"Uncore Power Scavenger: A Runtime for Uncore Power Conservation on HPC Systems" by Neha Gholkar, Frank Mueller, Barry Rountree, in Supercomputing (SC), Nov 2019, pages.
"BarrierFinder: Recognizing Ad Hoc Barriers" by Tao Wang, Xiao Yu, Zhengyi Qiu, Guoliang Jin, Frank Mueller in International Conference on Software Maintenance and Evolution (ICSME), Sep/Oct 2019.
"Evaluating Burst Buffer Placement in HPC Systems" by Harsh Khetawat, Christopher Zimmer, Frank Mueller, Scott Atchley, Sudharshan Vazhkudai, Misbah Mubarak in Cluster, Sep 2019, Best Paper Award.
"FuncyTuner: Auto-tuning Scientific Applications With Per-loop Compilation" by Tao Wang, Nikhil Jain, David Beckingsale, David Boehme, Frank Mueller, Todd Gamblin in International Conference on Parallel Processing (ICPP), Aug 2019, Best Paper Candidate.
End-to-end Resilience for HPC Applications by A. Rezaei, H. Khetawat, O. Patil, F. Mueller, P. Hargrove, E. Roman in International Supercomputing Conference (ISC), Jun 2019. Gauss Award for "most outstanding research paper submitted to ISC"
The Colored Refresh Server for DRAM Xing Pan, Frank Mueller in IEEE International Symposium on Real-Time Computing (ISORC), May 2019.
"CloneHadoop: Process Cloning to Reduce Hadoop's Long Tail" by Sarthak Kukreti, Frank Mueller, in IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), Dec 2018, pages (accepted), Best Paper Award.
"Doomsday: Predicting Which Node Will Fail When on Supercomputers" by Anwesha Das, Frank Mueller, Paul Hargrove, Eric Roman, Scott Baden, in Supercomputing (SC), Nov 2018, pages (accepted), Best Paper Candidate.
"A Failure Recovery Protocol for Software-Defined Real-Time Networks" by Tao Qian, Frank Mueller in Conference on Compiler, Architecture and Synthesis on Embedded Systems (CASES'18), Oct 2018, published in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), DOI 10.1109/TCAD.2018.2857299.
PShifter: Feedback-based Dynamic Power Shifting within HPC Jobs for Performance by Neha Gholkar, Frank Mueller, Barry Rountree in High-Performance Parallel and Distributed Computing (HPDC), Jun 2018, pages 106-117.
Desh: Deep Learning for System Health Prediction of Lead Times to Failure in HPC by Anwesha Das, Frank Mueller, Charles Siegel, Abhinav Vishnu in High-Performance Parallel and Distributed Computing (HPDC), Jun 2018, pages 40-51.
"Chameleon: Online Clustering of MPI Program Traces" by A. Bahmani, F. Mueller, in International Parallel and Distributed Processing Symposium (IPDPS), May 2018 10.1145/2597652.2597676.
Controller-Aware Memory Coloring for Multicore Real-Time Systems Xing Pan, Frank Mueller in Symposium on Applied Computing (SAC), Apr 2018, DOI 10.1145/3167132.3167196.
"KeyValueServe: Design and Performance Analysis of a Multi-Tenant Data Grid as a Cloud Service in Concurrency and Computation" by A. Das, A. Iyengar, F. Mueller in Concurrency and Computation: Practice and Experience (CCPE), V 30, No 14, Jan 2018 (accepted), pages 1-22, DOI 10.1002/cpe.4424.
"DINO: Divergent Node Cloning for Sustained Redundancy in HPC" by A. Rezaei, F. Mueller, P. Hargrove, E. Roman in Journal of Parallel and Distributed Computing (JPDC), V 109, No C, Jul 2017, pages 350-362, DOI 10.1016/j.jpdc.2017.06.010.
"Scalable Communication Event Tracing via Clustering" by A. Bahmani, F. Mueller in Journal of Parallel and Distributed Computing (JPDC), V 109, No C, Jul 2017, pages 230-244, DOI 10.1016/j.jpdc.2017.06.008.
A Linux Real-Time Packet Scheduler for Reliable Static SDN Routing by T. Qian, Frank Mueller and Yufeng Xin in Euromicro Conference on Real-Time Systems (ECRTS), Jul 2017, Outstanding Paper Award.
ScalaIOExtrap: Elastic I/O Tracing and Extrapolation Xiaoqing Luo, Frank Mueller, Philip Carns, Jonathan Jenkins, Robert Latham, Robert Ross and Shane Snyder in International Parallel and Distributed Processing Symposium (IPDPS) (IPDPS), May 2017.
"Efficient Clustering for Ultra-Scale Application Tracing" by A. Bahmani, F. Mueller in Journal of Parallel and Distributed Computing (JPDC), V ??, No ?, Aug 2016, pages ???, DOI 10.1016/j.jpdc.2016.08.001, accepted.
Benchmark Generation and Simulation at Extreme Scale by Mahesh Lagadapati Chandu, Frank Mueller, Christian Engelmann in International Symposium on Distributed Simulation and Real Time Applications (DS-RT), Sep 2016.
FlipSphere: A Software-based DRAM Error Detection and Correction Library for HPC by David Fiala, Frank Mueller, Kurt Ferreira in International Symposium on Distributed Simulation and Real Time Applications (DS-RT), Sep 2016.
Power Tuning HPC Jobs on Power-Constrained Systems by Neha Gholkar, Frank Mueller, Barry Rountree in International Conference on Parallel Architecture and Compilation Techniques (PACT), Sep 2016.
"Efficient Clustering for Ultra-Scale Application Tracing" by A. Bahmani, F. Mueller in Journal of Parallel and Distributed Computing (JPDC), V ??, No ?, Aug 2016, pages ???, DOI 10.1016/j.jpdc.2016.08.001, accepted.
Hybrid MPI/OpenMP Programming on the Tilera Manycore Architecture by Vishwanathan Chandu, Frank Mueller in International Conference on High Performance Computing & Simulation (HPCS), Jul 2016.
A Resilient Software Infrastructure for Wide-Area Measurement Systems by Tao Qian, Hang Xu, Aranya Chakrabortty, Frank Mueller, Yufeng Xin in IEEE Power & Energy Society General Meeting, Jul 2016.
Efficient and Predictable Group Communication for Manycore NoCs by Karthik Yagna, Onkar Patil, Frank Mueller in International Supercomputing Conference (ISC), Jun 2016.
Distributed Job Allocation for Large-Scale Manycores by Subramanian Ramachandran, Frank Mueller in International Supercomputing Conference (ISC), Jun 2016.
Mini-Ckpts: Surviving OS Failures in Persistent Memory by David Fiala, Frank Mueller, Kurt Ferreira, Christian Engelmann in International Conference on Supercomputing (ICS), Jun 2016.
TintMalloc: Reducing Memory Access Divergence via Controller-Aware Coloring Xing Pan, Yasaswini Gownivaripalli, Frank Mueller in International Parallel and Distributed Processing Symposium (IPDPS), May 2016.
"Exploiting Data Representation for Fault Tolerance" by J. Elliott, M. Hoemmen, F. Mueller", Journal of Computational Science, Vol. 14, May 2016, pages 51-60.
Reducing NoC and Memory Contention for Manycores by V. Chandru, F. Mueller in Architecture of Computing Systems (ARCS), Apr 2016.
OpenACC directive-based GPU acceleration of an implicit reconstructed discontinuous Galerkin method for compressible Rows on 3D unstructured grids by J. Lou, Y. Xia, L. Luo, H. Luo, J. R. Edwards, F. Mueller in AIAA SciTech, Jan 2016.
ACURDION: An Adaptive Clustering-based Algorithm for Tracing Large-scale MPI Applications by A. Bahmani, F. Mueller in IEEE Big Data, Oct 2015.
A fine-grained block ILU scheme on regular structures for GPGPUs by L. Luo, J. Edwards, H. Luo, F. Mueller in Journal of Computers and Fluids, Vol. 119, Sep 2015, pages 149-161, DOI: 10.1016/j.compfluid.2015.07.005
"DINO: Divergent Node Cloning for Sustained Redundancy in HPC" by A. Rezaei, F. Mueller in Cluster, Sep 2015.
Hybrid EDF Packet Scheduling for Real-Time Distributed Systems by T. Qian, Frank Mueller in Euromicro Conference on Real-Time Systems (ECRTS), Jul 2015, pages 37-46.
A Numerical Soft Fault Model for Iterative Linear Solvers by James Elliott, Mark Hoemmen, Frank Mueller in High-Performance Parallel and Distributed Computing, Jun 2015.
Optimization of A Fine-grained BILU by CUDA Inter-block Synchronization by L. Luo, J. R. Edwards, H. Luo, F. Mueller, W.-C. Feng in AIAA Aviation, pages 3055-3071, Jun 2015.
"Providing Task Isolation via TLB Coloring" by S. Panchamukhi and F. Mueller in Real-Time and Embedded Technology and Applications Symposium, Apr 2015.
NoCMsg: A Scalable Message Passing Abstraction for Network-on-Chips by Christopher Zimmer, Frank Mueller in ACM Transactions on Architecture and Code Optimization (TACO), Vol. 12, No. 1, Apr 2015.
OpenACC Acceleration of an Unstructured CFD Solver Based on a Reconstructed Discontinuous Galerkin Method for Compressible Flows by Y. Xia, H. Luo, L. Luo, J. Edwards, J. Lou, F. Mueller in International Journal for Numerical Methods in Fluids, DOI 10.1002/fld.4009, Feb 2015.
Transparent Fault Tolerance for Job Input Data in HPC Environments by Chao Wang, Sudharshan S. Vazhkudai, Xiaosong Ma, and Frank Mueller, Chapter in "Handbook on Data Centers", edited by Albert Y. Zomaya and Samee U. Khan in Handbook on Data Centers, Springer, accepted in 2014.
"Architecture Aware Semi Partitioned Real-Time Scheduling on Multicore Platforms" by M. Shekhar, A. Sarkar, H. Ramaprasad, F. Mueller, Real-Time Systems Journal, Feb 2015, pages (accepted), doi:10.1007/s11241-015-9221-4
OpenACC Acceleration of an Unstructured CFD Solver Based on a Reconstructed Discontinuous Galerkin Method for Compressible Flows by Y. Xia, H. Luo, L. Luo, J. Edwards, J. Lou, F. Mueller in International Journal for Numerical Methods in Fluids, DOI 10.1002/fld.4009, Feb 2015.
Advanced Optimizations of An Implicit Navier-Stokes Solver on GPGPU by L. Luo, J. Edwards, H. Luo, F. Mueller in AIAA SciTech, Jan 2015.
NoCMsg: Scalable Message Passing Abstraction for Network-on-Chips by Christopher Zimmer, Frank Mueller in ACM Transactions on Architecture and Code Optimization, Vol. ?, No. ?, Dec 2014 (accepted), pages ?-?.
"Affinity-Aware Checkpoint Restart" by A. Saini, A. Rezaei, F. Mueller, P. Hargrove, E. Roman in Middleware, Dec 2014.
"Exploiting Data Representation for Fault Tolerance" by James Elliott, Mark Hoemmen, and Frank Mueller, Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), Nov 2014.
A Methodology for Automatic Generation of Executable Communication Specifications from Parallel MPI Applications by X. Wu, F. Mueller, S. Pakin in ACM Transactions on Parallel Computing, Vol. 1, No. 1, Sep 2014, DOI 10.1145/2660249. Supplement
ScalaJack: Customized Scalable Tracing with in-situ Data Analysis by S. Ananthakrishnan, Frank Mueller in Euro-Par Conference, Aug 2014.
A Real-Time Distributed Hash Table by T. Qian, Frank Mueller in Conference on Embedded and Real-Time Computing Systems and Applications, Aug 2014.
"Semi-Partitioned Scheduling for Resource-Sharing Hard-Real-Time Tasks" by M. Shekhar, H. Ramaprasad, F. Mueller in TR 2014-8, Dept. of Computer Science, North Carolina State University, Jul 2014.
"Static Task Partitioning for Locked Caches in Multi-Core Real-Time Systems" by A. Sarkar, F. Mueller and H. Ramaprasad in ACM Transactions on Embedded Computing Systems (TECS), Vol. 14, No. 1, Jun 2015, pages 4:1-4:30.
"DINO: Divergent Node Cloning for Sustained Redundancy in HPC" by A. Rezaei, F. Mueller in TR 2014-7, Dept. of Computer Science, North Carolina State University, Jun 2014.
Snapify: Capturing Snapshots of Offload Applications on Xeon Phi Manycore Processors by Arash Rezaei, Guiseppe Coviello, Cheng-Hong Li, Srimat Chakradhar, Frank Mueller in High-Performance Parallel and Distributed Computing, Jun 2014.
Scalable Tracing of MPI Programs through Signature-Based Clustering Algorithms by A. Bahmani, F. Mueller in International Conference on Supercomputing, Jun 2014.
GPU Port of A Parallel Incompressible Navier-Stokes Solver based on OpenACC and MVAPICH2 by L. Luo, J. R. Edwards, H. Luo, F. Mueller in AIAA Aviation, Jun 2014.
Open ACC-Based Acceleration of a Reconstructed Discontinuous Galerkin Method for GPU Clusters by Y. Xia, H. Luo, J. Luo, J. Edwards, F. Mueller in AIAA Aviation, Jun 2014.
NoCMsg: Scalable NoC-Based Message Passing by Christopher Zimmer, Frank Mueller in International Symposium on Cluster, Cloud and Grid Computing, May 2014.
Evaluating the Impact of SDC on the GMRES Iterative Solver by James Elliott, Mark Hoemmen, Frank Mueller in International Parallel and Distributed Processing Symposium, May 2014.
Understanding the Tradeoffs between Software-Managed vs. Hardware-Managed Caches in GPUs by Chao Li, Yi Yang, Hongwen Dai, Shengen Yan, Frank Mueller, Huiyang Zhou in IEEE International Symposium on Performance Analysis of Systems and Software, Mar 2014.
"Resilience in Numerical Methods: A Position on Fault Models and Methodologies" by J. Elliott, M. Hoemmen, F. Mueller", invited talk at SIAM Conference on Computational Science and Engineering, Feb 2014.
"Tolerating Silent Data Corruption in Opaque Preconditioners" by J. Elliott, M. Hoemmen, F. Mueller", Computing Research Repository, Feb 2014.
Performance Assessment of A Multi-block Incompressible Navier-Stokes Solver using Directive-based GPU Programming in a Cluster Environment by L. Luo, J. R. Edwards, H. Luo, F. Mueller in AIAA Aerospace Sciences Meeting, Jan 2014.
OpenACC-based GPU Acceleration of a 3-D Unstructured Discontinuous Galerkin Method by Y. Xia, H. Luo, L. Luo, J. Edwards, J. Lou, F. Mueller in AIAA Aerospace Sciences Meeting, Jan 2014.
"Tools for Simulation and Benchmark Generation at Exascale" by Mahesh Lagadapati, Frank Mueller, and Christian Engelmann, Parallel Tools Workshop, Sep 2013.
Elastic and Scalable Tracing and Accurate Replay of Non-Deterministic Events by X. Wu, F. Mueller in International Conference on Supercomputing, Jun 2013, pages 59-68.
Benjamin Clay, Zhiming Shen, and Xiaosong Ma, Building and Scaling Virtual Clusters with Residual Resources from Interactive Clouds, poster, the 22nd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC '13), Jun 2013.
Mingliang Liu, Ye Jin, Jidong Zhai, Yan Zhai, Qianqian Shi, Xiaosong Ma, and Wenguang Chen, ACIC: Automatic Cloud I/O Configurator for Parallel Applications, poster, the 22nd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC '13), Jun 2013.
"Sustained Resilience via Live Process Cloning" by Arash Rezaei, Frank Mueller, Workshop on Dependable Parallel, Distributed and Network-Centric Systems, May 2013 (accepted).
"Quantifying the Impact of Single Bit Flips on Floating Point Arithmetic" by J. Elliott, F. Mueller, M. Stoyanov, C. Webster" in TR 2013-2, Dept. of Computer Science, North Carolina State University, Mar 2013.
"Auto-Generation and Auto-Tuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters" by Y. Zhang and F. Mueller in Transactions on Parallel and Distributed Systems, Vol. 24, No. 3, Mar 2013, pages 417-427, DOI 10.1109/TPDS.2012.160.
"HiDP: A Hierarchical Data Parallel Language" by Y. Zhang and F. Mueller in International Symposium on Code Generation and Optimization, Feb 2013, accepted.
Devesh Tiwari, Simona Boboila, Sudharshan Vazhkudai, Youngjae Kim, Xiaosong Ma, Peter Desnoyers, Yan Solihin, "Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines", 11th USENIX Conference on File and Storage Technologies (FAST'13), Feb 2013.
"Exploiting Data Representation for Fault Tolerance" by J. Elliott, M. Hoemmen, F. Mueller", Computing Research Repository, Feb 2013.
"Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing" by D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira, R. Brightwell, in Supercomputing, Nov 2012, pages 78:1--78:12.
Devesh Tiwari, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Simona Boboila, and Peter J. Desnoyers, "Reducing Data Movement Costs Using Energy-Efficient, Active Computation on SSD", the 2012 USENIX Workshop on Power-Aware Computing and Systems (HotPower '12), co-located with USENIX OSDI 2012, Oct 2012.
"CuNesl: Compiling Nested Data-Parallel Languages for SIMT Architectures" by Y. Zhang, Frank Mueller in International Conference on Parallel Processing, Sep 2012, DOI 10.1109/ICPP.2012.21.
John Jenkins, James Dinan, Pavan Balaji, Nagiza F. Samatova, Rajeev Thakur. Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments. In 2012 IEEE International Conference on Cluster Computing (CLUSTER). Pg. 468-476. 2012.
"Combining Partial Redundancy and Checkpointing for HPC" by J. Elliott, K. Kharbas, D. Fiala, F. Mueller, K. Ferreira, C. Engelmann in International Conference on Distributed Computing Systems, Jun 2012, DOI 10.1109/ICDCS.2012.56.
"Evaluating Operating System Vulnerability to Memory Errors" by Kurt B. Ferreira, Kevin Pedretti, Patrick G. Bridges, Ron Brightwell, David Fiala and Frank Mueller, Workshop on Runtime and Operating Systems for Supercomputers, Jun 2012, DOI 10.1145/2318916.2318930.
Feng Ji, Ashwin Aji, James Dinan, Darius Buntinas, Pavan Balaji, Rajeev Thakur, Wu-Chun Feng and Xiaosong Ma, "DMA-Assisted, Intranode Communication in GPU Accelerated Systems", the 14th IEEE International Conference on High Performance Computing and Communications (HPCC-2012), June 2012.
Chao Wang, Sudharshan Vazhkudai, Xiaosong Ma, Fei Meng, Youngjae Kim, and Christian Eagelmann, NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines, 2012 International Parallel and Distributed Processing Symposium (IPDPS '12), May 2012.
Feng Ji, Ashwin Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-Chun Feng and Xiaosong Ma, "Efficient Intranode Communication in GPU-Accelerated Systems", the Second International Workshop on Accelerators and Hybrid Exascale Systems (ASHES 2012), May 2012.
"Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing" by D. Fiala, F. Mueller, C. Engelmann, K. Ferreira, R. Brightwell, R. Riesen" in TR 2012-5, Dept. of Computer Science, North Carolina State University, May 2012.
"ScalaBenchGen: Auto-Generation of Communication Benchmark Traces" by X. Wu, V. Deshpande, F. Mueller, in International Parallel and Distributed Processing Symposium, May 2012.
"ScalaExtrap: Trace-Based Communication Extrapolation for SPMD Programs" by X. Wu, F. Mueller in ACM Transactions on Programming Languages, Vol. 34, No. 1, Apr 2012, DOI 10.1145/2160910.2160914.
"Low Contention Mapping of Real-Time Tasks onto a TilePro 64 Core Processor", by C. Zimmer and F. Mueller in Real-Time and Embedded Technology and Applications Symposium, Apr 2012, pages (accepted).
"Fault Resilient Real-Time Design for NoC Architectures", by C. Zimmer and F. Mueller International Conference on Cyber-Physical Systems, Apr 2012, pages (accepted).
"Auto-Generation and Auto-Tuning of 3D Stencil Codes on GPU Clusters" by Y. Zhang and F. Mueller in International Symposium on Code Generation and Optimization, Apr 2012, pages (accepted).
"Assessing HPC Failure Detectors for MPI Jobs" by K. Kharbas, D. Kim, K. KC, T. Hoefler and F. Mueller" in Euromicro International Conference on Parallel, Distributed and Network-Based Computing, Feb 2012.
"Auto-Generation of Communication Benchmark Traces" by V. Deshpande, X. Wu, F. Mueller, ACM SIGMETRICS Performance Evaluation Review, Vol. 40, No. 2, 2012, pages 15-16.
"Auto-Generation of Communication Benchmark Traces" by V. Deshpande, X. Wu, F. Mueller, Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, Nov 2011, DOI 10.1145/2088457.2088468.
R. Benjamin Clay, Zhiming Shen, Xiaosong Ma and Xiaohui Gu, Augmenting MapReduce with Active Volunteer Resources, poster, ACM Symposium on Operating Systems Principles (SOSP '11), Oct 2011.
"A Tunable, Software-based DRAM Error Detection and Correction Library for HPC" by D. Fiala, K. Ferreira, F. Mueller, C. Engelmann, Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids, Sep 2011, DIO 10.1007/978-3-642-29740-3_29.
"Probabilistic Communication and I/O Tracing with Deterministic Replay at Scale" by Xing Wu, Karthik Vijayakumar, Frank Mueller, Xiaosong Ma, Philip C. Roth in International Conference on Parallel Processing, Sep 2011.
"GStream: A General-Purpose Data Streaming Framework on GPU Clusters" by Yongpeng Zhang, Frank Mueller in International Conference on Parallel Processing, Sep 2011.
Comparing different approaches for Incremental Checkpointing: The Showdown by M. Vasavada, F. Mueller, P. Hargrove in Linux Symposium, Jun 2011, pages 69-79.
Automatic Generation of Executable Communication Specifications from Parallel Applications by X. Wu, F. Mueller, S. Pakin in International Conference on Supercomputing, Jun 2011, pages 12-21.
"Predictable Task Migration for Locked Caches in Multi-Core Systems" by A. Sarkar, F. Mueller and H. Ramaprasad in ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems, Jun 2011, pages 131-140.
"Failure Detection within MPI Jobs: Periodic Outperforms Sporadic" by K. Kharbas, D. Kim, K. KC, T. Hoefler and F. Mueller" in TR 2011-13, Dept. of Computer Science, North Carolina State University, Jun 2011.
"A Fault Observant Real-Time Embedded Design for Network-on-Chip Control Systems" by C. Zimmer and F. Mueller" in TR 2011-13, Dept. of Computer Science, North Carolina State University, Jun 2011.
"Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs" by S. Ramanna, F. Mueller, T. Gamblin, ACM SIGMETRICS Performance Evaluation Review, Vol. 38, No. 4, Mar 2011, pages 30-36.
"Static Task Partitioning for Locked Caches in Multi-Core Real-Time Systems" by A. Sarkar, F. Mueller, and H. Ramaprasad" in TR 2011-11, Dept. of Computer Science, North Carolina State University, Mar 2011.
"Probabilistic Communication and I/O Tracing with Deterministic Replay at Scale" by X. Wu, K.Vijayakumar, F. Mueller, X. Ma and P. C. Roth" in TR 2011-6, Dept. of Computer Science, North Carolina State University, Mar 2011.
"ScalaExtrap: Trace-Based Communication Extrapolation for SPMD Programs" by X. Wu, F. Mueller in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb 2011, pages 113-122.
"Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs" by S. Ramanna, F. Mueller, T. Gamblin, The Computer Journal, Oxford University Press, Vol. ?, No. ?, accepted Jun 2011, pages 1-12, DOI 10.1093/comjnl/bxr071.
"Proactive Process-Level Live Migration and Back Migration in HPC Environments" by C. Wang, F. Mueller, C. Engelmann and S. Scott in Journal of Parallel and Distributed Computing, V ?, No ?, Oct 2011 (accepted), pages ?, DOI 10.1016/j.jpdc.2011.10.009.

Theses:

"Compiler-based Auto-tuning and Synchronization Validation for HPC Applications" by T. Wang, Ph.D. Thesis, North Carolina State University, Dec 2019 (last known position: TBD)
"Predicting Location and Time of Anomalies in Large-Scale Computing Systems via LogMining" by A. Das, Ph.D. Thesis, North Carolina State University, Aug 2019 (last known position: postdoc, Stanford U, CA)
"On the Management of Power Constraints for High Performance Systems" by N. Gholkar, Ph.D. Thesis, North Carolina State University, Aug 2018 (last known position: Intel, CA)
"Providing DRAM Predictability for Real-Time Systems and Beyond" by X. Pan, Ph.D. Thesis, North Carolina State University, May 2018 (last known position: Baidu, China)
"Pragma-Based Compiler Extension for End-to-End Resiliency Against Soft Faults" by Harsh Khwetawat, M.S. Thesis, North Carolina State University, Nov 2017 (last known position: Ph.D. student, NCSU)
"Reducing Hadoop's long tail with Process Cloning" by Sarthak Kukreti, M.S. Thesis, North Carolina State University, Aug 2017 (last known position: Google, CA)
"End-to-end Predictability for Distributed Real-Time Systems" by T. Qian, Ph.D. Thesis, North Carolina State University, May 2017 (last known position: VMWare, CA)
"Scalable Communication Tracing via Clustering" by A. Bahmani, Ph.D. Thesis, North Carolina State University, May 2017 (last known position: research staff, Stanford Univ., CA)
"Fault Resilience for Next Generation HPC Systems" by A. Rezaei, Ph.D. Thesis, North Carolina State University, Mar 2016 (last known position: Samsung, CA)
"Server-side Log Data Analytics for I/O Workload Characterization and Coordination on Large Shared Storage Systems" by Y. Liu, Ph.D. Thesis, North Carolina State University, Mar 2016 (last known position: Epic Systems Corporation, WI)
"Resilient Iterative Linear Solvers Running Through Errors" by J. Elliott, Ph.D. Thesis, North Carolina State University, Oct 2015 (last known position: Sandia Nat'l Lab, NM)
"Transparent Resilience Across the Entire Software Stack for High-Performance Computing Applications" by D. Fiala, Ph.D. Thesis, North Carolina State University, Jun 2015 (last known position: Google, CA)
"ScalaMemAnalysis-MultiLevel: A Compositional Approach toMulti-level Cache Analysis of CompressedMemory Traces" by Saransh Gupta, M.S. Thesis, North Carolina State University, Aug 2015 (last known position: Intel, OR)
"Analysis of Memory Performance and Execution Models for Large-Scale Manycores" by Vishwanathan Chandru, M.S. Thesis, North Carolina State University, Aug 2015 (last known position: Intel, IL)
"Hybrid Cache, Bank, and Controller Aware Coloring for Multicore Real-Time Systems" by Yasaswini Gownivaripalli, M.S. Thesis, North Carolina State University, Jun 2015 (last known position: Intel, OR)
"ScalaIOExtrap: Elastic I/O Tracing and Extrapolation" by Xiaoqing Luo, M.S. Thesis, North Carolina State University, Jun 2015 (last known position: TBD)
"Bringing Efficiency and Predictability to Massive Multi-core NoC Architectures" by C. Zimmer, Ph.D. Thesis, North Carolina State University, Dec 2012 (last known position: Cisco, NC)
"Scalable Communication Tracing for Performance Analysis of Parallel Applications" by X. Wu, Ph.D. Thesis, North Carolina State University, Dec 2012 (last known position: Amazon, WA)
"Exploiting Data-Parallelism in GPUs" by Y. Zhang, Ph.D. Thesis, North Carolina State University, Sep 2012 (last known position: Stone Ridge Technologies, MD)
"Power Balancing Cloud-Based Workloads" by Sandeep Kandula, M.S. Thesis, North Carolina State University, Aug 2014 (last known position: Amazon, WA)
"Providing Task Isolation via TLB Coloring" by Shrinivas Panchamukhi, M.S. Thesis, North Carolina State University, Jul 2014 (last known position: Intel, OR)
"Effcient and Lightweigth Inter-process Collective Operations for Massive Multi-core Architectures" by Onkar Patil, M.S. Thesis, North Carolina State University, Jun 2014 (last known position: NetApp, NC)
"ScalaMemAnalysis: A Compositional Approach to Cache Analysis of Compressed Memory Traces" by Nishanth Balasubramanian, M.S. Thesis, North Carolina State University, Jun 2014 (last known position: Nvidia, CA)
"Distributed Job Allocation for Large-Scale Many-cores" by Subramanian Ramachandran, M.S. Thesis, North Carolina State University, May 2014 (last known position: Riverbed, CA)
"Affinity-Aware Checkpoint Restart" by Anjay Saini, M.S. Thesis, North Carolina State University, May 2014 (last known position: Intel, OR)
"Benchmark Generation and Simulation at Extreme Scale" by Mahesh Lagadapati, M.S. Thesis, North Carolina State University, May 2014 (last known position: Nvidia, CA)
"Collective Communication for Multi-core NOC Interconnects" by Karthik Yagna, M.S. Thesis, North Carolina State University, May 2013 (last known position: Riverbed technologies, CA)
"Scalable Locks with Backoff Suspension for Manycore Systems" by Chadan Apsangi, M.S. Thesis, North Carolina State University, May 2013 (last known position: Intel, OR)
"Customized Scalable Tracing with in-situ Data Analysis" by Srinash Krishna Ananthakrishnan, M.S. Thesis, North Carolina State University, May 2013 (last known position: Riverbed Technologies, CA)
"Automatic Generation of Complete Communication Skeletons from Traces" by Vivek Deshpande, M.S. Thesis, North Carolina State University, Aug 2011 (last known position: Intel, OR)
"Failure Detection and Partial Redundancy in HPC" by Kirshor Kharbas, M.S. Thesis, North Carolina State University, Aug 2011 (last known position: Intel, OR)
"Design and Implementation of Process Migration and Cloning in BLCR" by Shobit Mishra, M.S. Thesis, North Carolina State University, Aug 2011 (last known position: Intel, CA)
"This material is based upon work supported by the National Science Foundation under Grant No. 0958311."
"Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."