Lecture Notes and Reading Material

We have online access to IEEE and ACM publications (among others). Most articles below require that you find and download them, which means that you get some experience in literature search on the side. If you have problems with the download, please read about their access mechanisms and report problems to the library.

Introduction PPT
Rauber Chapter 2 / Foster Chapters 1+3
- J.L. Gustafson, G.R. Montry and R.E. Benner, Development Of Parallel Methods For A 1,024-Processor Hypercube, SIAM Journal on Scientific and Statistical Computing, Vol. 9, No. 4, July 1988.
- C. Lameter An Overview of Non-Uniform Memory Access, Communications of the ACM, Vol. 56 No. 9, Pages 59-54, Sep 2013. July 1988.
Message Passing PPT
Rauber Chapter 5, Foster Chapter 8
- Gropp, W., Lusk, E., Doss, N., Skjellum, A. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard Parallel Computing, North-Holland, vol. 22, pp. 789-828
NVIDIA CUDA
Intro PPT
Pi Examples and source code
- S.Ryoo, C. Rodrigues, S. Baghsorkhi, S. Stone, Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, PPoPP'08
Shared Memory Programming PPT
Rauber Chapter 6
- OpenMP Standard
Parallel Architecture Overview PPT
no textbook chapter
- Top 500, specifically:
  - 6/2023 Top 500 List
Caches and Memory Systems PPT
Rauber Chapter 4
- R. Whaley, A. Petitet, and J. Dongarra, Automated Empirical Optimizations of Software and the ATLAS Project in Parallel Computing 27(1-2):3-25, January 2001
Big Data: Map-Recuce and Spark PPT
- Spark: Cluster Computing with Working Sets, M. Zaharia, M. Chowdhury, M. Franklin, S. Shenker and I. Stoica, HotClouds 2010, June 2010
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica, NSDI 2012, San Jose, CA, April 2012
Fault Tolerance Overview (slides), covering:
- "A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance" by C. Wang and F. Mueller and C. Engelmann and S. Scott" in International Parallel and Distributed Processing Symposium, Mar 2007.
- "Proactive Process-Level Live Migration and Back Migration in HPC Environments" by C. Wang, F. Mueller, C. Engelmann and S. Scott in Journal of Parallel and Distributed Computing, V 72, No 2, Feb 2012, pages 254-267, DOI 10.1016/j.jpdc.2011.10.009.
- "Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing" by D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira, R. Brightwell, " in Supercomputing, Nov 2012, pages 78:1--78:12..
- "Combining Partial Redundancy and Checkpointing for HPC" by J. Elliott, K. Kharbas, D. Fiala, F. Mueller, K. Ferreira, C. Engelmann in International Conference on Distributed Computing Systems, Jun 2012, DOI 10.1109/ICDCS.2012.56.
- Desh: Deep Learning for System Health Prediction of Lead Times to Failure in HPC by Anwesha Das, Frank Mueller, Charles Siegel, Abhinav Vishnu in High-Performance Parallel and Distributed Computing (HPDC), Jun 2018, pages 40-51.
Machine Learning and TensorFlow/Keras PPT
- TensorFlow: A System for Large-Scale Machine Learning, Martín Abadi et al., OSDI 2016.
- Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools, Ruben Mayer, Hans-Arno Jacobsen, ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: September 2019
Performance PPT
Rauber Chapter 4, Foster Chapter 3
- Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (April 2009), 65-76.
- Jackson Marusarz, Max Katz, Charlene Yang and Samuel Williams, 2020. Accelerating HPC Applications with NVIDIA Nsight Compute Roofline Analysis
Interconnection & Communication PPT
Rauber Chapters 1+4, Culler Chapter 10
- David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. LogP: towards a realistic model of parallel computation. In Proceedings of Principles and Practice of Parallel Programming, 1993.
- S. Kamil, L. Oliker, A. Pinar, J. Shalf, "Communication Requirements and Interconnect Optimization for High-End Scientific Applications", IEEE Transactions on Parallel and Distributed Systems (TPDS), 2009 (in press).
Power Management (slides):
- "Uncore Power Scavenger: A Runtime for Uncore Power Conservation on HPC Systems" by Neha Gholkar, Frank Mueller, Barry Rountree, in Supercomputing (SC), Nov 2019, pages.
- PShifter: Feedback-based Dynamic Power Shifting within HPC Jobs for Performance b\ y Neha Gholkar, Frank Mueller, Barry Rountree in High-Performance Parallel and Distributed Computing (HPDC), Jun 2018, pages 106-117.
- Power Tuning HPC Jobs on Power\ -Constrained Systems by Neha Gholkar, Frank Mueller, Barry Rountree in International Conference on Parallel Architecture and Compilation Techniques (PACT), Sep 2016.
Parallel I/O PPT
- Ceph: A Scalable, High-Performance Distributed File System by Sage Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, Carlos Maltzahn in Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI '06).
- S. Ghemawat, H. Gobioff, S. Leung, The Google File System, Symposium on Operating Systems Principles, October 2003.

Profing/Tracing: ScalaTrace Overview (slides):

ScalaTrace: Scalable Compression and Replay of Communication Traces in High Performance Computing" by M. Noeth and P. Ratn and F. Mueller and M. Schulz and B. de Supinski, Journal of Parallel and Distributed Computing, V 69, No 8, Aug 2009, pages 969-710
ScalaExtrap: Trace-Based Communication Extrapolation for SPMD Programs by X. Wu, F. Mueller in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb 2011, pages 113-122.