Although we were able to obtain production traces with the last patch, debug traces largely eluded us; all of the problematic benchmarks (CG and LU) would terminate before writing out any trace data, while IS would work as anticipated.
At first, this seemed like it was another Fortran related problem, but it turned out to simply be a result of the poor compression on the CG and LU benchmarks. The first step in tracking down the problem was to comment out all code within #ifdef RSD_DEBUG / #endif blocks that did real work, then uncommenting them one at a time. printf debugging indicated that the failure was occuring somewhere in rsd_queue.c:compress_rsd, with a segfault occuring around lines 553 and 554 when both output_rsd_node statements were in place (ie, either one could be commented out and it ran correctly).
That function (and all of the functions that it called) were non-mutating, so could not have been doing anything in particular on their own. The only anomoulous feature present in all four involved functions was the use of sprintf into a stack buffer; converting all of these directly into fprintfs mostly solved the issue.
That did not completely fix the problem, however; the segfault in this case was just shifted to later. There were additional calls to sprintf throughout the mpi-spec.* files; required modifications directly to these calls were not immediately apparent, so we worked around the problem temporarily by expanding the buffer size used in those stack buffers (the TSIZE definition in umpi_internal.h at line 340).
With these two sets of changes, we can now (finally) get full debug traces on the LU and CG benchmarks. The full changes are available here in unified diff format. Apply the patch as follows:
cd /tmp
tar xzf ~/record.tgz
cd record
patch -p1 < /path/to/patch
to a fresh copy of the record library (it includes the changes to fix the Fortran linking issues).
Here are some sample partial traces:
A sample root trace (~35MB)
A sample non-root trace (~3MB)