The SPU loads and stores one quadword at-a-time. When instructions
use or produce scalar operands (including addresses), the value
is kept in the preferred scalar slot of a SIMD register.
Scalar (sub quadword) loads and stores require several instructions to
format the data for use on the SIMD architecture of the SPE.
Scalar loads must be rotated into the preferred slot. Scalar stores require
a read, scalar insert, and write operation. These extra formatting instructions
reduce performance.
Vector operations on scalar data are not efficient. The following strategies
can be used to make operations on scalar data more efficient:
- Change the scalars to quadword vectors. By eliminating the three extra
instructions associated with loading and storing scalars, code size and execution
time can be reduced.
- Cluster scalars into groups, and load multiple scalars at a time using
a quadword memory access. Manually extract or insert the scalars as needed.
This will eliminate redundant loads and stores.
SPU intrinsics are provided in the C/C++ Language Extensions to efficiently
promote scalars to vectors, or vectors to scalars. These intrinsics are listed
in
Table 1.
Table 1. Intrinsics for Changing Scalar and Vector Data TypesInstruction |
Description |
d = spu_insert |
Insert a scalar into a specified vector element. |
d = spu_promote |
Promote a scalar to a vector. |
d = spu_extract |
Extract a vector element from its vector. |