General-purpose processors have typically addressed branch prediction
by supporting hardware look-asides with branch history tables (BHT), branch
target address caches (BTAC), or branch target instruction caches (BTIC).
The SPU addresses branch prediction through a set of
hint for branch (HBR)
instructions that facilitate efficient branch processing by allowing programs
to avoid the penalty of taken branches.
- If a branch hint is provided, software speculates that the instruction
branches to the target path.
- If a hint is not provided, software speculates that the instruction does
not branch to a new location (that is, it stays inline).
- If speculation is incorrect, the speculated branch is flushed and refetched.
It is possible to sequence multiple hints in advance of multiple branches.
As with all programmer-provided hints, care must be exercised when using branch
hints because, if the information provided is incorrect, performance might
degrade.
Branch-hint instructions can provide three kinds of advance knowledge about
future branches:
- Address of the branch target (that is, where will the branch take the
flow of control)
- Address of the actual branch instruction (known as the hint-trigger
address )
- Prefetch schedule (when to initiate prefetching instructions at the branch
target)
Branch-hint instructions load a branch-target buffer (BTB) in the SPU.
When the BTB is loaded with a branch target, the hint-trigger address and
branch address are also loaded into the BTB. After loading, the BTB monitors
the instruction stream as it goes into the issue stage of the pipeline. When
the address of the instruction going into issue matches the hint trigger address,
the hint is triggered, and the SPU speculates to the target address in the
hint buffer.
Branch-hint instructions have no program-visible effects. They provide
a hint to the SPE architecture about a future branch instruction, with the
intention that the information be used to improve performance by prefetching
the branch target. The SPE branch-hint instructions are shown in
Table 1.
There are immediate and indirect forms for this instruction class. The location
of the branch is always specified by an immediate operand in the instruction.
Table 1. Branch-Hint InstructionsInstruction |
Description |
hbr s11, ra |
Hint for branch (r-form). Hint that the instruction addressed by the
sum of the address of the current instruction and the signed extended, 11-bit
value s11 will branch to the address contained in word element
0 of register ra. This form is used to hint function returns,
pointer function calls, and other situations that give rise to indirect branches. |
hbra s11, s18 |
Hint for branch (a-form). Hint that the instruction addressed by the
sum of the address of the current instruction and the signed extended, 11-bit
value s11 will branch to the address specified by the sign
extended, 18-bit value s18. |
hbrr s11, s18 |
Hint for branch relative. Hint that the instruction addressed by the
sum of the address of the current instruction and the signed extended, 11-bit
value s11 will branch to the address specified by the sum
of the address of the current instruction and sign extended, 18-bit value s18. |
The following rules apply to the hint for branch (HBR) instructions:
- An HBR instruction should be placed at least 11 cycles followed by four
instruction pairs before the branch instructions being hinted by the HBR instruction.
In other words, an HBR instruction must be followed by at least 11 cycles
of instructions, followed by eight instructions aligned on an even address
boundary. More separation between the hint and branch improves the performance
of applications on future SPU implementations.
- If an HBR instruction is placed too close to the branch, then a hint stall
will result. This results in the branch instruction stalling until the timing
requirement of the HBR instruction is satisfied.
- If an HBR instruction is placed closer to the hint-trigger address than
four instruction pairs plus one cycle, then the hint stall does not occur
and the HBR is not used.
- Only one HBR instruction can be active at a time. Issuing another HBR
cancels the current one.
- An HBR instruction can be moved outside of a loop and will be effective
on each loop iteration as long as another HBR instruction is not executed.
- The HBR instruction must be placed within 255 instructions of the branch
instruction.
- The HBR instruction only affects performance.
The HBR instructions can be used to support multiple strategies of branch
prediction. These include:
- Static Branch Prediction — Prediction based upon branch type or
displacement, and prediction based upon profiling or linguistic hints.
- Dynamic Branch Prediction — Software caching of branch-target addresses,
and using control flow to record branching history.
A common approach to generating static branch prediction is to use expert
knowledge that is obtained either by feedback-directed optimization techniques
or using linguistic hints supplied by the programmer.
The document
C/C++ Language Extensions for Cell Broadband Engine Architecture defines
a mechanism for directing branch prediction. The
__builtin_expect directive
allows programmers to predict conditional program statements. The following
example demonstrates how a programmer can predict that a conditional statement
is false (
a is not larger than
b).
if(__builtin_expect((a>b),0))
c += a;
else
d += 1;
Not only can the __builtin_expect directive
be used for static branch prediction, it can be used for dynamic branch prediction.