Reducing the impact of branches

The SPU hardware assumes linear instruction flow, and produces no stall penalties from sequential instruction execution. A branch instruction has the potential of disrupting the assumed sequential flow.

Correctly predicted branches execute in one cycle, but a mispredicted branch (conditional or unconditional) incurs a penalty of approximately 18-19 cycles. Considering the typical SPU instruction latency of two-to-seven cycles, mispredicted branches can seriously degrade program performance. Branches also create scheduling barriers, reducing the opportunity of for dual issue and covering up dependency stalls.

The most effective means of reducing the impact of branches is to eliminate them using three primary methods — inlining, unrolling, and predication. The next effective means of reducing the impact of branches is to use the branch-hint instructions.

If a branch hint is provided, software speculates that the instruction branches to the target path. If a hint is not provided, software speculates that the branch is not taken (that is, instruction execution continues sequentially). If either speculation is incorrect, there is a large penalty (flush and refetch).