Efficient data partitioning and data layout design is the key to a well-performed
ALF application. Improper data partitioning and data layout design either
prevents ALF from being applicable or results in degraded performance. Data
partition and layout is closely coupled with compute kernel design and implementation,
and they should be considered simultaneously. You should consider
the following for your data layout and partition design:
- Use the correct size for the data partitioned for each work block. Often
the local memory of the accelerator is limited. Performance can degrade if
the partitioned data cannot fit into the available memory. For
example, on Cell BE architecture, if the input buffer of a work block is larger
than 128 KB, it might not be possible to support double buffering on the SPE.
This can result in up to 50% performance loss.
- Minimize the amount of data movement. A large amount of data movement
can cause performance loss in applications. Improve performance by avoiding
unnecessary data movements.
- Simplify data movement patterns. Although the data transfer list feature
of ALF enables flexible data gathering and scattering patterns, it is better
to keep the data movement patterns as simple as possible. Some good examples
are sequential access and using contiguous movements instead of small discrete
movements.
- Avoid data reorganization. Data reorganization requires extra work. It
is better to organize data in a way that suits the usage pattern of the algorithm
than to write extra code to reorganize the data when it is used.
- Be aware of the address alignment limitations
on Cell BE.