Example 1: Tuning SPE performance with static and dynamic timing analysis
- Static analysis of SPE threads
The listing below shows an spu-timing static timing analysis for the inner loop of the SPE code.
- Dynamic analysis of SPE threads
The listing below shows a dynamic timing analysis on the same SPE inner loop using the IBM Full System Simulator for the Cell Broadband Engine.
- Optimizations
To eliminate stalls and improve the CPI — and ultimately the performance — the compiler needs more instructions to schedule, so that the program does not stall. The SPE's large register file allows the compiler or the programmer to unroll loops.
- Static analysis of optimization
The listing below shows a spu_timing static timing analysis for the optimized SPE thread (process _buffer subroutine only).
- Dynamic analysis of optimizations
The listing below shows a dynamic timing analysis on the IBM Full System Simulator for the Cell Broadband Engine for the optimized SPE thread (process buffer only). It shows that 78 registers are used, so the used percentage is 60.94.