Design/Approach
Identification of hotspots:
We have identified the functions within the solver that take up the most execution time using the gprof tool.
Data and execution model:
The data in this solver is partitioned according to the number of processors and processor topology, and each process works on its own chunk of data. The code also uses OpenMP regions and solves computation intensive regions with multiple threads.
Our solution:
We have ported these openmp regions along with some other normal loops that consume time. Hence, we replace openmp threads with GPU threads. In the previous report we have indicated some of the hotspots and issues that could be faced while porting. The next section contains the overview of the porting some of the hotspots.