Reducing branch mispredicts with branch hint

General-purpose processors have typically addressed branch prediction by supporting hardware look-asides with branch history tables (BHT), branch target address caches (BTAC), or branch target instruction caches (BTIC).

The SPU addresses branch prediction through a set of hint for branch (HBR) instructions that facilitate efficient branch processing by allowing programs to avoid the penalty of taken branches.

If a branch hint is provided, software speculates that the instruction branches to the target path.
If a hint is not provided, software speculates that the instruction does not branch to a new location (that is, it stays inline).
If speculation is incorrect, the speculated branch is flushed and refetched.

It is possible to sequence multiple hints in advance of multiple branches. As with all programmer-provided hints, care must be exercised when using branch hints because, if the information provided is incorrect, performance might degrade.

Branch-hint instructions can provide three kinds of advance knowledge about future branches:

Address of the branch target (that is, where will the branch take the flow of control)
Address of the actual branch instruction (known as the hint-trigger address )
Prefetch schedule (when to initiate prefetching instructions at the branch target)

Branch-hint instructions load a branch-target buffer (BTB) in the SPU. When the BTB is loaded with a branch target, the hint-trigger address and branch address are also loaded into the BTB. After loading, the BTB monitors the instruction stream as it goes into the issue stage of the pipeline. When the address of the instruction going into issue matches the hint trigger address, the hint is triggered, and the SPU speculates to the target address in the hint buffer.

Branch-hint instructions have no program-visible effects. They provide a hint to the SPE architecture about a future branch instruction, with the intention that the information be used to improve performance by prefetching the branch target. The SPE branch-hint instructions are shown in Table 1. There are immediate and indirect forms for this instruction class. The location of the branch is always specified by an immediate operand in the instruction.

Table 1. Branch-Hint Instructions
Instruction	Description
hbr s11, ra	Hint for branch (r-form). Hint that the instruction addressed by the sum of the address of the current instruction and the signed extended, 11-bit value `s11` will branch to the address contained in word element 0 of register `ra`. This form is used to hint function returns, pointer function calls, and other situations that give rise to indirect branches.
hbra s11, s18	Hint for branch (a-form). Hint that the instruction addressed by the sum of the address of the current instruction and the signed extended, 11-bit value `s11` will branch to the address specified by the sign extended, 18-bit value `s18`.
hbrr s11, s18	Hint for branch relative. Hint that the instruction addressed by the sum of the address of the current instruction and the signed extended, 11-bit value `s11` will branch to the address specified by the sum of the address of the current instruction and sign extended, 18-bit value `s18`.

The following rules apply to the hint for branch (HBR) instructions:

An HBR instruction should be placed at least 11 cycles followed by four instruction pairs before the branch instructions being hinted by the HBR instruction. In other words, an HBR instruction must be followed by at least 11 cycles of instructions, followed by eight instructions aligned on an even address boundary. More separation between the hint and branch improves the performance of applications on future SPU implementations.
If an HBR instruction is placed too close to the branch, then a hint stall will result. This results in the branch instruction stalling until the timing requirement of the HBR instruction is satisfied.
If an HBR instruction is placed closer to the hint-trigger address than four instruction pairs plus one cycle, then the hint stall does not occur and the HBR is not used.
Only one HBR instruction can be active at a time. Issuing another HBR cancels the current one.
An HBR instruction can be moved outside of a loop and will be effective on each loop iteration as long as another HBR instruction is not executed.
The HBR instruction must be placed within 255 instructions of the branch instruction.
The HBR instruction only affects performance.

The HBR instructions can be used to support multiple strategies of branch prediction. These include:

Static Branch Prediction — Prediction based upon branch type or displacement, and prediction based upon profiling or linguistic hints.
Dynamic Branch Prediction — Software caching of branch-target addresses, and using control flow to record branching history.

A common approach to generating static branch prediction is to use expert knowledge that is obtained either by feedback-directed optimization techniques or using linguistic hints supplied by the programmer.

The document C/C++ Language Extensions for Cell Broadband Engine Architecture defines a mechanism for directing branch prediction. The __builtin_expect directive allows programmers to predict conditional program statements. The following example demonstrates how a programmer can predict that a conditional statement is false (a is not larger than b).

	if(__builtin_expect((a>b),0))
	  c += a;
	else
	  d += 1;

Not only can the __builtin_expect directive be used for static branch prediction, it can be used for dynamic branch prediction.