Based on the characteristics of an application, you can use single-use
work blocks or multi-use work blocks to efficiently implement data partitioning
on the accelerators.
For a given task that can be partitioned into
N work blocks, the
following describes how the different types of work blocks can be used, and
also the order of function calls per task instance based on a single instance
of a the task on a single accelerator:
- Task instance initialization (this is done by the ALF runtime)
- Conditional execute: alf_accel_task_context_setup is
only called if the task has context. The runtime calls it when the initial
task context data has been loaded to the accelerator and before any work blocks
are processed.
- For each work block WB(k):
- If there are pending context merges, go to Step 4.
- For each iteration of a multi-use work block i < N (total
number of iteration)
- alf_accel_input_list_prepare(WB(k), i, N): It is only
called when the task requires accelerator data partition.
- alf_accel_comp_kernel(WB(k), i, N): The computational
kernel is always called.
- alf_accel_output_list_prepare(WB(k), i, N): It is only
called when the task requires accelerator data partition.
- Conditional execute: alf_accel_task_context_merge This
API is only called when the context of another unloaded task instance is to
be merged to current instance.
- If there are pending work blocks, go to Step 3.
- Write out task context.
- Unload image or pending for next scheduling.
- If a new task instance is created, go to Step 2.
For step 3, the calling order of the three function calls is defined by
the following rules:
- For a specific single-use work block WB(k), the following
calling order is guaranteed:
- alf_accel_input_list_prepare(WB(k))
- alf_accel_comp_kernel(WB(k))
- alf_accel_output_list_prepare(WB(k))
- For two single-use work blocks that are assigned to the same task instance
in the order of WB(k) and WB(k+1), ALF only
guarantees the following calling orders:
- alf_accel_input_list_prepare(WB(k)) is called before alf_accel_input_list_prepare(WB(k+1))
- alf_accel_comp_kernel(WB(k)) is called before alf_accel_comp_kernel(WB(k+1))
- alf_accel_output_list_prepare(WB(k)) is called before
alf_accel_output_list_prepare(WB(k+1))
- For a multi-use work block WB(k,N), it is considered
as N single use work blocks assigned to the same task instance in the
order of incremental iteration index WB(k,0), WB(k, 1), …, WB(k, N-1).
The only difference is that all these work blocks share the same work block
parameter and context buffer. Other than that, the API calling order is still
decided by the previous two rules. See Modifying the work block parameter and context buffer when using multi-use work blocks.