ScalaTrace
scalatrace-dev
ScalaTrace Overview
ScalaTrace is an MPI tracing toolset that provides orders of magnitude smaller,
if not near-constant size, communication traces regardless of the
number of nodes while preserving structural information. Combing
intra- and inter-node compression techniques of MPI events,
the trace tool extracts an application's communication structure.
A replay tool allows communication events recorded by our
trace tool to be issued in an order-preserving manner without running
the original applocation code.
The tool has been tested on BlueGene and x86_x64 platforms for
different MPI implementations so far. ScalaTrace may be used in
communication tuning, procument of future machines and beyond. To the
best of our knowledge, such a concise representation of MPI traces in
a scalable manner combined with deterministic MPI call replay are
without any precedent.
Detailed overview:
Scalable Compression, Replay and
Extrapolation of Communication and I/O Traces in Massively Parallel
Environments
MPI Introduction:
First of all here's an quick MPI tutorial with examples:
MPI Tutorial
For more details about parallel computing you can refer to this book online:
Book
Compiling MPI programs:
On most platforms you have mpicc, mpicxx and mpif77/mpif90 as the C, C++ and Fortran compilers (a lot of
scientific benchmarks are written in fortran and the ScalaTrace framework is written in C/C++). These are
usually wrappers around gcc or intel compilers.
On BG/L we normally use the IBM compilers (but you can use the gcc compilers if you want). The corresponding
compilers on BG/L are mpixlc, mpixlcxx and mpixlf77 (which are wrappers around blrts_xl* )
Compile:
$ mpixlc -o main main.c
Running MPI programs:
Many supercomputers and clusters have batch systems - you submit jobs to a queue and the scheduler takes care to
run them. The normal sequence would be as follows:
Run:
$ cqsub -t <time> -n ./main
(on BG/L you need to specify how long the expected run time is (in minutes) after which if the program is still running it will be terminated)
This command will queue your job for the scheduler to run. It will also output a job-id e.g. 20495.
Once the program is terminated you can check the stdout and stderr output in files named e.g. 20495.output and 20495.error
If your program segfaults and crashes you'll also get one core file per segfaulting node named core.n where n is the node id.
Check status of the queue
$ cqstat
Delete a job
$ cqdel <job-id>
Building ScalaTrace library:
To build the library :-
$ cd record
$ make
There are three libraries that will be built by default in
record/lib
These are :
- libdump.a (flat traces, no compression)
- libnode.a(compression on node only) and
- libglob.a (global compression)
You might have to modify
record/Makefile depending on which version(s) you want to compile.
If you want to change compilers, you can edit record/config/Makefile.config.
If you want to enable/disable timing deltas edit
record/libsrc/Makefile.libsrc
The main library source is in record and common. Please read README and BUILD
files in record .
To build samples:
$make test
You can select which samples to build by editing
record/tests/Makefile.
This step compiles a sample MPI program and links it with the library we built above.
Running samples:
Run them as you would run any MPI program. Once the program runs successfully it
will generate a folder called recorded_ops_n where n the number of nodes that you ran on.
In this folder will be trace files named 0, 1, ... , n. If you link with -lglob, you will have only
one file named 0. There will also be a file called "times" which has the running time information.
Reading trace files:
Reading trace files directly can be cumbersome.
You can go to
rcat dir
and do a '
make rcat'. The
rcat tool is useful for transforming the trace into
a more readable format. See
rcat -h for options (-p, -e and to some extent -c
are the most useful ones I find)
People
Frank Mueller
Martin Schulz
Bronis de Supinski
Prasun Ratn
Todd Gamblin
Mike Noeth
Karthik Vijayakumar
Sandeep Budanur Ramanna
- ScalaTrace V4 (adds several custering options)
- ScalaTrace V3 (adds new
extrapolation of MPI, MPI-IO and POSIX I/O traces, further improved MPI-IO and POSIX I/O)
- ScalaTrace V2.2 (adds new elastic
compression, redesign MPI-IO and POSIX I/O tracing)
- ScalaTrace V0.5 (adds
MPI-IO and POSIX I/O tracing with lossless and lossy/histogram recordings)
- ScalaMem V0.1 (Scalable
record and replay framework for memory references under x86 with PIN)
Publications
- ScalaJack: Customized Scalable Tracing with in-situ Data Analysis by S. Ananthakrishnan, Frank Mueller in
Euro-Par Conference, Aug 2014 (accepted).
-
Scalable Tracing of MPI Programs through Signature-Based Clustering Algorithms
by A. Bahmani, F. Mueller
in International Conference on Supercomputing, Jun 2014 (accepted).
-
Elastic and Scalable Tracing and Accurate Replay of Non-Deterministic Events
by X. Wu, F. Mueller
in International Conference on Supercomputing, Jun 2013, accepted.
-
"ScalaBenchGen:
Auto-Generation of Communication Benchmark Traces"
by X. Wu, V. Deshpande, F. Mueller, in
International Parallel and Distributed Processing Symposium, May 2012
DOI 10.1109/IPDPS.2012.114.
-
"Probabilistic Communication and I/O Tracing with Deterministic Replay
at Scale" by Xing Wu, Karthik Vijayakumar, Frank
Mueller, Xiaosong Ma, Philip C. Roth in International Conference
on Parallel Processing, Sep 2011, pages 196-205.
-
Automatic Generation of Executable Communication Specifications from Parallel Applications
by X. Wu, F. Mueller, S. Pakin
in International Conference on Supercomputing, Jun 2011, pages 12-21.
-
"ScalaExtrap: Trace-Based Communication Extrapolation for SPMD Programs"
by X. Wu, F. Mueller
in ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming, Feb 2011, pages 113-122.
-
"ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale"
by F. Mueller, X. Wu, M. Schulz, B. de
Supinski, T. Gamblin in Para 2010: State of the Art in
Scientific and Parallel Computing (invited), Springer LNCS 7133,
eds. K. Jonasson, Jun 2010, pages 410-418.
-
"ScalaTrace: Scalable
Compression and Replay of Communication Traces in High Performance Computing"
by M. Noeth and P. Ratn and F. Mueller and M. Schulz and B. de
Supinski,
Journal of Parallel and Distributed Computing, V ?, No ?, accepted Sep
2008, pages ???.
-
"Preserving Time in Large-Scale Communication Traces"
by P. Ratn and F. Mueller and M. Schulz and B. de
Supinski
in International Conference on Supercomputing, Jun 2008, pages 46-55.
-
"Scalable Compression and Replay of Communication Traces in Massively Parallel Environments"
by M. Noeth and F. Mueller and M. Schulz and B. de
Supinski in International Parallel and Distributed Processing Symposium, Mar
2007, Best Paper Award.
-
"An Open Infrastructure for Scalable, Reconfigurable Analysis"
by B. de Supinski and Rob Fowler and Todd Gamblin
and F. Mueller and P. Ratn and M. Schulz
in International Workshop on
Scalable Tools for High-End Computing, Jun 2008, pages 39-50.
-
"An Open Framework for Scalable, Reconfigurable Performance
Analysis" by T. Gamblin, P. Ratn, B. de
Supinski, M. Schulz, F. Mueller, R. Fowler and D. Reed, refereed poster
at Supercomputing, Nov 2007. (
html)
-
"Scalable Compression and Replay of Communication Traces in Massively Parallel Environments"
by M. Noeth and F. Mueller and M. Schulz and B. de
Supinski, refereed poster at Supercomputing, Nov 2006.