Performance issues

Two software tools are available in the SDK to assist in measuring the performance of programs: the spu-timing static timing analyzer, and the IBM Full System Simulator for the Cell Broadband Engine.

The spu-timing analyzer performs a static timing analysis of a program by annotating its assembly instructions with the instruction-pipeline state. This analysis is useful for coarsely spotting dual-issue rates (odd and even pipeline use) and assessing what program sections may be experiencing instruction-dependency and data-dependency stalls. It is useful, for example, for determining whether or not dependencies might be mitigated by unrolling, or whether reordering of instructions or better placement of no-ops will improve the dual-issue behavior in a loop. However, static analysis outputs typically do not provide numerical performance information about program execution. Thus, it cannot report anything definitive about cycle counts, branches taken or not taken, branches hinted or not hinted, DMA transfers, and so forth.

The IBM Full System Simulator for the Cell Broadband Engine performs a dynamic analysis of program execution. It is available in the SDK. Any part of a program, from a single line to the entire program, can be studied. Performance numbers are provided for:

Instruction histograms (for example, branch, hint, and prefetch)
Cycles per instruction (CPI)
Single-issue and dual-issue rates
Stall statistics
Register use

The output of the IBM Full System Simulator for the Cell Broadband Engine can be a text listing or a graphic plot.