An Open Framework for Scalable, Reconfigurable Performance Analysis

Todd Gamblin¹, Prasun Ratn^2,3, Bronis R. de Supinski³, Martin Schulz³, Frank Mueller², Robert J. Fowler¹, Daniel Reed¹

¹Renaissance Computing Institute, ²North Carolina State University, ³Lawrence Livermore National Laboratory

ScalaTrace: Reconfigurable Scalable Performance Analysis

ScalaTrace compression framework provides:

Near constant low-overhead MPI traces
ability to annotate with additional reconfigurable data, e.g. Time (using adaptive histograms); progress rates, load imbalance

ScalaReplay: Replay Using Histogram Timing Annotations

Figure: Bins generated for synthetic input span entire range with similar sample counts	Idea: preserve time in compressed traces Encode time deltas instead of timestamps Create delta histograms automatically Dynamically balance histograms
Number of histograms per record depends on the number of possible call paths	Path-sensitive histograms Time depends on path taken Distinguish histograms by path Sample: MPI_Allreduce (..); for (..) { for (..) { MPI_Send (..); MPI_Recv (..); } MPI_Barrier (..); }
Sample bimodal distribution from UMT2k collectives	Histograms detect imbalances Variable sizes capture variance
Trace sizes (NAS Benchmarks and UMT2K)	The benchmarks fall into three categories: near-constant trace sizes, e.g. DT, EP, LU sub-linear trace sizes, e.g. CG, MG, FT non-scalable trace sizes, e.g. BT, IS, UMT2k
Replay Accuracy (NAS Benchmarks and UMT2K)	The benchmarks fall into three categories: accurate replay: DT, EP, FT, LU, IS, UMT2k Replay inaccurate in MPI time: CG, MG Replay inaccurate in compute time: BT

Evolutionary Load-Balance Analysis with Scalable Data Collection

Idea: Normalize measurements and models based on application semantics

Progress loops

Typically outer loops in SPMD codes indicate absolute progress towards some domain-specific goal
Basis for comparison of load over time

Effort loops

Progress instrumentation

Effort modeled with code regions

Load Balance in ParaDiS

Models dislocation dynamics in crystals

Future Directions

Flexible framework for application-specific tools

Near term