In the actual MPICTrace system, at runtime each function call is recorded by using the pre and post wrappers of Umpire. Generally for all the functions the post wrappers are used to record the actual function call being made, along with relevant parameters (e.g. source of communication, destination etc.). All parameters required for a complete replay of the trapped procedure are stored in a replay_op structure.
The structure thus generated is then passed to a function compress_node along along with an rsd_queue object which is globally declared for every node running the trace compression. The compress_node function inserts the recorded operation into the rsd_queue and does the intranode compression (4.3.1) on the fly.
Internode compression (4.3.2) is done at the end, in the post-wrapper of the MPI_Finalize call. It is done by arranging nodes in a binary tree and having each node aggregate information from its children and passing it to its parent.
There are a number of compilation flags defined in umpi_internal.h which allow us to tweak the behaviour enable/disable debugging of the MPICTrace system.