What to consider for data layout design

Efficient data partitioning and data layout design is the key to a well-performed ALF application. Improper data partitioning and data layout design either prevents ALF from being applicable or results in degraded performance. Data partition and layout is closely coupled with compute kernel design and implementation, and they should be considered simultaneously. You should consider the following for your data layout and partition design:

Use the correct size for the data partitioned for each work block. Often the local memory of the accelerator is limited. Performance can degrade if the partitioned data cannot fit into the available memory. For example, on Cell BE architecture, if the input buffer of a work block is larger than 128 KB, it might not be possible to support double buffering on the SPE. This can result in up to 50% performance loss.
Minimize the amount of data movement. A large amount of data movement can cause performance loss in applications. Improve performance by avoiding unnecessary data movements.
Simplify data movement patterns. Although the data transfer list feature of ALF enables flexible data gathering and scattering patterns, it is better to keep the data movement patterns as simple as possible. Some good examples are sequential access and using contiguous movements instead of small discrete movements.
Avoid data reorganization. Data reorganization requires extra work. It is better to organize data in a way that suits the usage pattern of the algorithm than to write extra code to reorganize the data when it is used.
Be aware of the address alignment limitations on Cell BE.