CSC 766 Project 1 – Progress Report 1

Background

The goal of the project is to perform a design space exploration to find the set of optimizations which provide the highest performance improvements on speculative code. This design space exploration is different from its non-speculative counterpart, since this one focuses only on those optimizations are enabled by certain speculative transformations. The transformations performed on the code are low probability events. One example of such a low probability event is the aliasing between a load and a set of stores - provided that the load aliases with the stores rarely, it is possible to hoist the load across the stores. This transformation might allow optimizations like register allocation and code scheduling to move code around to improve ILP and MLP (Memory Level Parallelism).

Progress towards Milestones

In the project proposal, I stated that I will try to implement function cloning, and perform different passes on each version of the function by November 10th. Towards this end, I have been able to implement a pass which generates multiple versions of a function using LLVM. I am currently running different optimizations on the functions created, and evaluating their performance. I hope to have the results in the next few days.

I have also started implementing the infrastructure in PIN required to support the speculation using bloom filters. The PIN implementation has to work with the LLVM generated binary, to track speculation. I am yet to start implementing the first-order accurate timing model in PIN. I plan to implement a cache simulator for this purpose in PIN.

LLVM – Function Cloning

LLVM categorizes passes, broadly, into two types, namely, Analysis Passes and Transformation Passes. Analysis passes are used to inspect the code and collect information without modifying it. An example of an analysis pass is the dominator tree pass. This pass constructs the dominator tree in LLVM without changing the code in any way. Transformation passes, on the other hand, modify the application code. An example of a transformation pass is loop invariant code motion. This pass moves loop-invariant instructions out of the loop. The passes can also be categorized as module, function and loop level passes. The module level passes are used to analyze/perform transformations on, the whole program. These are the passes used for inter-procedural work. The function level passes work at the level of the function, and the same applies for loop level passes. The requirement for this classification is simple. LLVM does not expect a function level pass to mess with other functions. Hence, it is important not to modify other functions within a function level pass. Function cloning, requires that we create multiple versions of the functions, and this needs to be done at the module level. The module level requirement arises because we need to change all the call sites, for the function in question. Since call sites for indirect calls cannot be determined statically always, the function cloning is restricted to direct calls only. The steps for performing function cloning are:

1.       Clone the basic block, and control and data flow between the basic blocks. In LLVM IR, multiple functions can have the same SSA names within them. So, it’s not strictly necessary to create new SSA names for the variables in the cloned function.

2.       Set the calling convention for the newly cloned function, to match the one for the original function.

3.       Split the basic block at each site of the function. In LLVM IR, a CallInst is not a basic block terminating instruction. Hence, it is necessary to split the block before the call instruction. A further split is required to ensure that can have multiple call sites for the original function, so that each one takes us to a different version of the function. The original block terminator is now replaced with a switch statement (LLVM IR provides this instruction with semantics similar to the high-level C version of it), which chooses between different versions of the functions.

4.       Before the switch case, a checkpoint is inserted to log memory in PIN, and recover to, in case of misspeculation. The actual rolling back of the PC is handled in PIN.

5.       PIN calculates, using its timing simulator, the time taken to execute different versions of the function. All the versions of the function are run, on each call, in the inspector phase. The version taking the least amount of time to execute (taking into account misspeculations), qualifies to the next phase. The chosen version is now used as the default function call in the execution phase.

Issues with the function cloning arose due to calling convention mismatch between the original function and the cloned function. It turns out that LLVM maintains this information within the compiler, and hence it needs to be explicitly copied from one version to the other. Another issue is the calling convention match between LLVM and PIN. LLVM should not inline the cloned function, and it should also maintain the calling convention to a conventional C standard to ensure that PIN can read all the arguments correctly.

Immediate Goals

In the ten days, I hope to finish implementing the PIN tool, and the cache simulator. This should ensure that I can speculate and recover properly. With this infrastructure set up, I would be able to perform the aforementioned design-space exploration, and gather useful information.