DMA transfers

DMA commands transfer data between the LS and main storage.

Main storage is addressed by an effective address (EA) operand in a DMA command. The LS is addressed by the local store address (LSA) operand in a DMA command. The size of a single DMA transfer is limited to 16 KB:

The LS data is accessed sequentially with a minimum step of one quadword.

Software on an SPE accesses its MFC's DMA-transfer facilities through the channels listed in Channels. To enqueue a DMA command, SPE software writes the MFC Command Parameter Channel Registers with the wrch instruction (described in Channel instructions) in the following sequence:
  1. Write the EA-high (EAH) to the MFC_EAH channel.
  2. Write the EA-low (EAL) to the MFC_EAL channel.
  3. Write the transfer size to the MFC_Size channel.
  4. Write the tag ID to the MFC_TagID channel.
  5. Write the class ID and command opcode to the MFC_Cmd channel.
The following examples shows how to initiate a DMA transfer from an SPE.
extern void dma_transfer(volatile void *lsa,     // local store address
              unsigned int eah,        // high 32-bit effective address
              unsigned int eal,        // low 32-bit effective address
              unsigned int size,       // transfer size in bytes
              unsigned int tag_id,     // tag identifier (0-31)
An ABI-compliant assembly-language implementation of the subroutine is:
   .text
   .global   dma_transfer
dma_transfer:
   wrch        $MFC_LSA, $3
   wrch        $MFC_EAH, $4
   wrch        $MFC_EAL, $5
   wrch        $MFC_Size, $6
   wrch        $MFC_TagID, $7
   wrch        $MFC_Cmd, $8
   bi          $0 
A comparable C implementation using the SPU composite intrinsic spu_mfcdma64 is:
#include <spu_intrinsics.h> 
void dma_transfer(volatile void *lsa, unsigned int eah, unsigned int eal, 
             unsigned int size, unsigned int tag_id, unsigned int cmd)
{
     spu_mfcdma64(lsa, eah, eal, size, tag_id, cmd);
} 

The performance of a DMA data transfer is best when the source and destination addresses are aligned on a cache line boundary are are at least a cache line sized.

Quadword-offset-aligned data transfers generate full cache-line bus requests for every unrolling, except possibly the first and last unrolling.

Transfers that start or end in the middle of a cache line transfer a partial cache line (less than 8 quadwords) in the first or last bus request, respectively.