Lecture Notes and Reading Material
We have online
access to IEEE and ACM publications (among others). Most articles
below require that you find and download them, which means that
you get some experience in literature search on the side. If you have
problems with the download, please read about their access
mechanisms and report problems to the library.
- Introduction
PPT
Rauber Chapter 2 / Foster Chapters 1+3
- J.L. Gustafson, G.R. Montry and R.E. Benner,
Development
Of Parallel Methods For A 1,024-Processor Hypercube, SIAM
Journal on Scientific and Statistical Computing, Vol. 9, No. 4,
July 1988.
- C. Lameter
An Overview of Non-Uniform Memory Access, Communications of
the ACM, Vol. 56 No. 9, Pages 59-54, Sep 2013.
July 1988.
- Message Passing
PPT
Rauber Chapter 5, Foster Chapter 8
- Gropp, W., Lusk, E., Doss, N., Skjellum, A.
A High-Performance, Portable Implementation of the MPI Message
Passing Interface Standard
Parallel Computing, North-Holland, vol. 22, pp. 789-828
- NVIDIA CUDA
Intro PPT
Pi Examples and
source code
- Shared Memory Programming
PPT
Rauber Chapter 6
- Parallel Architecture Overview
PPT
no textbook chapter
- Caches and Memory Systems
PPT
Rauber Chapter 4
- Big Data: Map-Recuce and Spark
PPT
-
Spark:
Cluster Computing with Working Sets, M. Zaharia,
M. Chowdhury, M. Franklin, S. Shenker and I. Stoica, HotClouds
2010, June 2010
-
Resilient
Distributed Datasets: A Fault-Tolerant Abstraction for
In-Memory Cluster Computing, Matei Zaharia, Mosharaf
Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy
McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica,
NSDI 2012, San Jose, CA, April 2012
- Fault Tolerance Overview (slides), covering:
-
"A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance"
by C. Wang and F. Mueller and C. Engelmann and
S. Scott"
in International Parallel and Distributed Processing Symposium, Mar
2007.
-
"Proactive Process-Level Live Migration and Back Migration in HPC Environments"
by C. Wang, F. Mueller, C. Engelmann and S. Scott
in Journal of Parallel and Distributed Computing, V 72,
No 2, Feb 2012, pages 254-267, DOI 10.1016/j.jpdc.2011.10.009.
-
"Detection and Correction of Silent Data Corruption for Large-Scale
High-Performance Computing" by D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira, R. Brightwell, "
in Supercomputing, Nov 2012, pages 78:1--78:12..
-
"Combining Partial Redundancy and Checkpointing for HPC" by J. Elliott, K. Kharbas, D. Fiala, F. Mueller, K. Ferreira,
C. Engelmann in International
Conference on Distributed Computing Systems, Jun 2012, DOI 10.1109/ICDCS.2012.56.
- Desh: Deep Learning for System Health Prediction of Lead
Times to Failure in HPC by
Anwesha Das, Frank Mueller, Charles Siegel, Abhinav Vishnu in
High-Performance Parallel and Distributed Computing (HPDC), Jun
2018, pages 40-51.
- Machine Learning and TensorFlow/Keras
PPT
- Performance
PPT
Rauber Chapter 4, Foster Chapter 3
- Interconnection & Communication
PPT
Rauber Chapters 1+4, Culler Chapter 10
- David Culler, Richard Karp, David Patterson, Abhijit
Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and
Thorsten von Eicken.
LogP: towards a realistic model of parallel computation.
In Proceedings of Principles and Practice of Parallel
Programming, 1993.
- S. Kamil, L. Oliker, A. Pinar, J. Shalf, "Communication Requirements and Interconnect Optimization for High-End Scientific Applications", IEEE Transactions on Parallel and Distributed Systems (TPDS), 2009 (in press).
- Power Management (slides):
-
"Uncore Power Scavenger: A
Runtime for Uncore Power Conservation on HPC Systems"
by Neha Gholkar, Frank Mueller, Barry Rountree,
in Supercomputing (SC), Nov 2019, pages.
- PShifter:
Feedback-based Dynamic Power Shifting within HPC Jobs for Performance b\
y
Neha Gholkar, Frank Mueller, Barry Rountree in
High-Performance Parallel and Distributed Computing (HPDC), Jun
2018, pages 106-117.
- Power Tuning HPC Jobs on Power\
-Constrained Systems
by Neha Gholkar, Frank Mueller, Barry Rountree
in International Conference on Parallel Architecture and
Compilation Techniques (PACT), Sep 2016.
- Parallel I/O PPT
-
Ceph: A Scalable, High-Performance Distributed File System
by Sage Weil, Scott A. Brandt, Ethan L. Miller,
Darrell D. E. Long, Carlos Maltzahn in Proceedings of the 7th
Conference on Operating Systems Design and Implementation
(OSDI '06).
- S. Ghemawat, H. Gobioff, S. Leung,
The
Google File System, Symposium on Operating Systems
Principles, October 2003.
Profing/Tracing: ScalaTrace Overview (slides):