DMA commands transfer data between the LS and main storage.
The LS data is accessed sequentially with a minimum step of one quadword.
extern void dma_transfer(volatile void *lsa, // local store address unsigned int eah, // high 32-bit effective address unsigned int eal, // low 32-bit effective address unsigned int size, // transfer size in bytes unsigned int tag_id, // tag identifier (0-31)
.text .global dma_transfer dma_transfer: wrch $MFC_LSA, $3 wrch $MFC_EAH, $4 wrch $MFC_EAL, $5 wrch $MFC_Size, $6 wrch $MFC_TagID, $7 wrch $MFC_Cmd, $8 bi $0
#include <spu_intrinsics.h> void dma_transfer(volatile void *lsa, unsigned int eah, unsigned int eal, unsigned int size, unsigned int tag_id, unsigned int cmd) { spu_mfcdma64(lsa, eah, eal, size, tag_id, cmd); }
The performance of a DMA data transfer is best when the source and destination addresses are aligned on a cache line boundary are are at least a cache line sized.
Quadword-offset-aligned data transfers generate full cache-line bus requests for every unrolling, except possibly the first and last unrolling.
Transfers that start or end in the middle of a cache line transfer a partial cache line (less than 8 quadwords) in the first or last bus request, respectively.