Also it seems to me that MPICTrace with timing deltas should be modified, so that instead of aggregating on internode compression, we store the full timing information for each node. The basic idea behind timing deltas is to allow us to observe communication patterns. Some of this information will be lost if we aggregate it across nodes. While intra-node compression seems reasonable, internode-compression of timing information seems a little less attractive. So we must try and store all this information as is. This should particularly be the case if nodes are asymmetrical in performance or placement in the cluster.