Floating-point operations

The SPU executes both single-precision and double-precision floating-point operations. Single-precision instructions are performed in 4-way SIMD fashion, fully pipelined, whereas double-precision instructions are partially pipelined.

The data formats for single-precision and double-precision instructions are those defined by IEEE Standard 754, but the results calculated by single-precision instructions are not fully compliant with IEEE Standard 754.

For single-precision operations, the range of normalized numbers is extended beyond the IEEE standard. The representable, nonzero numbers range from Xmin = 2¹²⁶ to Xmax = (2 -²³)2¹²⁸. If the exact result overflows (that is, if it is larger in magnitude than Xmax), the rounded result is set to Xmax with the appropriate sign. If the exact result underflows (that is, if it is smaller in magnitude than Xmin), the rounded result is forced to zero. A zero result is always a positive zero.

Single-precision floating-point operations implement IEEE 754 arithmetic with the following changes:

Only one rounding mode is supported: round towards zero, also known as truncation.
Denormal operands are treated as zero, and denormal results are forced to zero.
Numbers with an exponent of all ones are interpreted as normalized numbers and not as infinity or not-a-number (NaN).

Double-precision operations do not support the IEEE precise trap (exception) mode. If a double-precision denormal or not-a-number (NaN) result does not conform to IEEE Standard 754, then the deviation is recorded in a sticky bit in the FPSCR register, which can be accessed using the fscrrd and fscrwr instructions or the spu_mffpscr and spu_mtfpscr intrinsics.

Double-precision instructions are performed as two double-precision operations in 2-way SIMD fashion. However, the SPU is capable of performing only one double-precision operation per cycle. Thus, the SPU executes double-precision instructions by breaking up the SIMD operands and executing the two operations in consecutive instruction slots in the pipeline. Although double-precision instructions have 13-clock-cycle latencies, only the final seven cycles are pipelined. No other instructions are dual-issued with double-precision instructions, and no instructions of any kind are issued for six cycles after a double-precision instruction is issued.