SPE accelerator memory constraints

The size of local memory on the SPE accelerator is 256 KB and is shared by code and data. Memory is not virtualized and is not protected. See Figure 1 for a typical memory map of an SPU program. There is a runtime stack above the global data memory section. The stack grows from the higher address to the lower address until it reaches the global data section. Due to the limitation of programming languages and compiler and linker tools, you cannot predict the maximum stack usage when you develop the application and when the application is loaded. If the stack requires more memory than what was allocated you do not get a stack overflow exception (unless this was enabled by the compiler at build time) you get undefined results such as bus error or illegal instruction. When there is a stack overflow, the SPU application is shut down and a message is sent to the PPE.

ALF allocates the work block buffers directly from the memory region above the runtime stack, as shown in Figure 2. This is implemented by moving the stack pointer (or equivalently by pushing a large amount of data into the stack). To ALF, the larger the buffer is, the better it can optimize the performance of a task by using techniques like double buffering. It is better to let ALF allocate as much memory as possible from the runtime stack. If the stack size is too small at runtime, a stack overflow occurs and it causes unexpected exceptions such as incorrect results or a bus error.

Figure 1. SPU local memory map of a common Cell BE application This graphic shows an SPU local map for common Cell BE application

Figure 2. SPU local memory map of an ALF application This graphic shows an SPU local map for an ALF application