Preserving Time in Large-Scale Communication Traces

Presenter: Prasun Ratn

Paper link

Abstract

Analyzing the performance of large-scale scientific applications is becoming increasingly difficult due to the sheer size of performance data gathered. Recent work on scalable communication tracing applies online interprocess compression to address this problem. Yet, analysis of communication traces requires knowledge about time progression that cannot trivially be encoded in a scalable manner during compression. We develop scalable time stamp encoding schemes for communication traces. At the same time, our work contributes novel insights into the scalable representation of time stamped data. We show that our representations capture sufficient information to enable what-if explorations of architectural variations and analysis for path-based timing irregularities while not requiring excessive disk space. We evaluate the ability of several time-stamped compressed MPI trace approaches to enable accurate timed replay of communication events. Our lossless traces are orders of magnitude smaller, if not near constant size, regardless of the number of nodes while preserving timing information suitable for application tuning or assessing requirements of future procurements. Our results prove timepreserving tracing without loss of communication information can scale in the number of nodes and time steps, which is a result without precedent.