Error handling

DaCS provides support for registration of user-created error handlers which are called under certain error conditions. The error handlers can be called for synchronous or asynchronous errors.

In SDK 3.0 any synchronous error reported to the error handlers will cause the process to abort. This will happen when DaCS has detected a fatal error from which it cannot recover. Asynchronous errors include child failures (host process) and termination requests from a parent (accelerator process). Abnormal child termination will cause the parent to abort after calling all registered error handlers.

A normal child exit with a non-zero status will be reported asynchronously to the error handlers, but will not cause the process to abort. This allows the parent process to determine if the non-zero exit represents an error condition.

When it is called a user error handler is passed an error object describing the error, which can be inspected using services provided. The error object contains the DE and PID of the failing process. These can be used to call dacs_de_test() to reap its status and so allow another process to be started on that DE.

The DaCS library uses the SIGTERM signal for handling asynchronous errors and termination requests. A dedicated error handling thread is created in dacs_runtime_init() for this purpose. Applications using the DaCS library should not create any application threads before calling dacs_runtime_init(), and no application thread should unmask this signal.

User error handler example

User error handler registration

For this example we're going to create an user error handler called my_errhandler. Once this has been defined we can register the user error handler using the dacs_errhandler_reg API:

dacs_rc= dacs_errhandler_reg((dacs_error_handler_t)&my_errhandler,0);
Note: If the address of my_errhandler is not passed or the cast to dacs_error_handler_t is omitted the compiler will produce warnings.

User error handler code:

/****************************************************************
Example of a user error handler 
This includes invocations of additional functions of 
the passed "dacs_error_t" error parameter
****************************************************************/
int my_errhandler(dacs_error_t error){
        /*need local variables for passback of values */
        DACS_ERR_T dacs_rc=0;
        DACS_ERR_T dacs_error_rc;//hold code for error
        de_id_t de=0;
        dacs_process_id_t pid=0;
        uint32_t code = 0;
        const char * error_string;
   
        /* Get the DACS_ERR_T in the error to learn what happened */
		 printf("\n\n--in my_dacs_errhandler\n");
        dacs_error_rc=dacs_rc=dacs_error_num(error);
        printf("  dacs_error_num indicates DACS_ERR_T=%d %s\n",
               dacs_rc,dacs_strerror(dacs_rc));

        /* Get the exit code in the error to learn what happened */
        dacs_rc=dacs_error_code(error,&code);
        if(dacs_rc){//if error invoking dacs_error_code
          printf("  dacs_error_code call had error DACS_ERR_T=%d %s\n",
                 dacs_rc,dacs_strerror(dacs_rc));
        }
        else {
          if (DACS_STS_PROC_ABORTED==dacs_error_rc){
             printf("  dacs_error_code signal signal=%d  ",code);
          }
          else if (DACS_STS_PROC_FAILED==dacs_error_rc){
             printf("  dacs_error_code exit code=%d\n",code);
          }
          else {//else reason is different than aborted or failed
             printf("  dacs_error_code exit/signal code=%d\n",code);      
          }
        }

        /* Get the error string in the error to learn what happened */
        dacs_rc=dacs_error_str(error,&error_string);
        if(dacs_rc){//if error invoking dacs_error_str
          printf("  dacs_error_str call had error DACS_ERR_T=%d %s\n",
                 dacs_rc,dacs_strerror(dacs_rc));
        }
        else {
          printf("  dacs_error_str=%s\n",error_string);
        }
         
        /* what DE had this error ? */
        dacs_rc=dacs_error_de(error,&de);
        if(dacs_rc){//if error invoking dacs_error_de
          printf("  dacs_error_de call had error DACS_ERR_T=%d %s\n",
                 dacs_rc,dacs_strerror(dacs_rc)); 
        }
        else {
          printf("  dacs_error_de=%08x\n",de);
        }

        /* what was the dacs_process_id_t? */
        dacs_rc=dacs_error_pid(error,&pid);
        if(dacs_rc){//if error invoking dacs_error_pid
          printf("  dacs_error_pid call had error" 
                 "DACS_ERR_T=%d %s\n",dacs_rc,dacs_strerror(dacs_rc));
        }
        else {
          printf("  dacs_error_pid=%ld\n",pid);
        }
        printf("exiting user error handler\n\n");
        return 0;//in SDK 3.0, return value is ignored
}

User error handler output

Example output if the accelerator program exits with a return code of 9:

--in my_dacs_errhandler
  dacs_error_num indicates DACS_ERR_T=4 DACS_STS_PROC_FAILED
  dacs_error_code exit code=9
  dacs_error_str=DACS_STS_PROC_FAILED
  dacs_error_de=01020200
  dacs_error_pid=5503
exiting user error handler

Example output if the accelerator program aborts:

--in my_dacs_errhandler
  dacs_error_num indicates DACS_ERR_T=5 DACS_STS_PROC_ABORTED
  dacs_error_code signal signal=6    dacs_error_str=DACS_STS_PROC_ABORTED
  dacs_error_de=01020200
  dacs_error_pid=5894
exiting user error handler