TBDA: Trace Based Dependence Analyzer
Dependence graphs can be used as a vehicle for formulating and implementing compiler optimizations. [1]. Static compiler analysis of a program may lead to conservative dependence graphs, some of whose edges might be exercised with extremely low probability during most execution runs. Also, imprecise pointer analysis may add may-dependence edges [2] that might prevent aggressive compiler optimizations from being applied on the program. TBDA proposes building a dynamic dependence graph (DDG) [3] based on memory reference traces obtained during a typical run of the program. TBDA also proposes adorning the edges in the dependence graph with distance and direction vectors [4] and probability of the dependence edge being exercised, which can be used by a compiler to base heuristics for speculative or user-directed parallelization.
Konrad Lai states in his position statement that “with the advent of multi-cores, the academic and industry must solve the parallel programming problem for a broad range of applications with your typical good programmers, not just rocket scientists.” There exists a lot of code that has not not been parallelized and which has been written in languages like C/C++ on which pointer analysis is difficult. Standard compilers like gcc implements very limited and conservative pointer analysis. More powerful research compilers capable of performing more aggressive analysis and optimizations are not easily available or are broken. In that light, a tool that makes dependence information available to the user and compiler can guide the compiler to performing optimizations which it couldn't using it's own analysis. A user may also use the information from the dependence analyzer to quickly identify potentially parallel regions in the program.
The proposed framework extends Intel's Open Research Compiler to insert instrumentation code in the WHIRL IR in it's very high level. The instrumentation would trace the addresses of scalar and array loads and stores. It would also trace beginnings and endings of different iterations of loop together with function beginnings and endings for each call site. The instrumented IR is then lowered by ORC into and linked with a tracing library to produce the instrumented binary.
The instrumented binary is then run to produce an execution trace. The dependence analyzer is then invoked over this trace file and a dynamic dependence graph is built. Using the loop begin and end markers, distance vectors are calculated for each dependence edge and are aggregated into direction vectors. These distance and direction vectors are then made available to the user or a compiler and can be used to guide parallelization decisions and base heuristics for selection of speculative threads for TLS systems.
The general working of the framework is shown in the figure below:
Figure1: Tracing and Dependence Framework