Using work blocks and order of function calls per task instance on the accelerator

Based on the characteristics of an application, you can use single-use work blocks or multi-use work blocks to efficiently implement data partitioning on the accelerators.

For a given task that can be partitioned into N work blocks, the following describes how the different types of work blocks can be used, and also the order of function calls per task instance based on a single instance of a the task on a single accelerator:

Task instance initialization (this is done by the ALF runtime)
Conditional execute: alf_accel_task_context_setup is only called if the task has context. The runtime calls it when the initial task context data has been loaded to the accelerator and before any work blocks are processed.
For each work block WB(k):
1. If there are pending context merges, go to Step 4.
2. For each iteration of a multi-use work block i < N (total number of iteration)
  1. alf_accel_input_list_prepare(WB(k), i, N): It is only called when the task requires accelerator data partition.
  2. alf_accel_comp_kernel(WB(k), i, N): The computational kernel is always called.
  3. alf_accel_output_list_prepare(WB(k), i, N): It is only called when the task requires accelerator data partition.
Conditional execute: alf_accel_task_context_merge This API is only called when the context of another unloaded task instance is to be merged to current instance.
1. If there are pending work blocks, go to Step 3.
Write out task context.
Unload image or pending for next scheduling.
1. If a new task instance is created, go to Step 2.

For step 3, the calling order of the three function calls is defined by the following rules:

For a specific single-use work block WB(k), the following calling order is guaranteed:
1. alf_accel_input_list_prepare(WB(k))
2. alf_accel_comp_kernel(WB(k))
3. alf_accel_output_list_prepare(WB(k))
For two single-use work blocks that are assigned to the same task instance in the order of WB(k) and WB(k+1), ALF only guarantees the following calling orders:
- alf_accel_input_list_prepare(WB(k)) is called before alf_accel_input_list_prepare(WB(k+1))
- alf_accel_comp_kernel(WB(k)) is called before alf_accel_comp_kernel(WB(k+1))
- alf_accel_output_list_prepare(WB(k)) is called before alf_accel_output_list_prepare(WB(k+1))

For a multi-use work block WB(k,N), it is considered as N single use work blocks assigned to the same task instance in the order of incremental iteration index WB(k,0), WB(k, 1), …, WB(k, N-1). The only difference is that all these work blocks share the same work block parameter and context buffer. Other than that, the API calling order is still decided by the previous two rules. See Modifying the work block parameter and context buffer when using multi-use work blocks.