Hardware Acceleration

PROJECT PROGRESS

MILESTONE: NOV/11/2009


        Our team has remained on-schedule with our proposed timeline for completion. We have read background papers and presentations on the math and algorithms that are used by the AMG benchmark. This allowed us to properly understand how the functions could be parallelized and also how the data could be divided. 

With this knowledge, we were able to profile the AMG benchmark on the Henry 2 system using one, two, four and eight processors. Using gprof to analyze, we made minor modifications to the AMG source code to allow gprof to analyze all tasks simulataneously and provide us with the hot spots. We concluded that the best three functions to parallelize were: hypre_BoomerAMGBuildCoarseOperator, hypre_BoomerAMGRelax and hypre_CSRMatrixMatvec. We chose these functions because they consistently appeared in our profiling results for different numbers of processors and by the parallel nature of the source code for each function. We found by reading the source code that each of these three functions had regions that are parallelized by the openMP for pragma. 

Concluding this work opened up different branches of work for us to do. First, we were now ready to begin porting the source code to the SPEs. We each had a function to work on and began by analyzing the variables of our own functions. We found that if we could classify all variables used in the section that we will be porting into three categories, then we would know what data will need to be communicated each direction. Private variables do not need to be communicated and do not have an initial value so they will not be transferred at all and will be initialized by each SPE. Read-only variables are those that are read during the section that will be ported but were initialized prior to that region. For this reason, read-only variables must be transferred to the SPE when it is called but do not need to be transferred back since they will not be altered. Read-write variables, however, are initialized prior to the region of code and/or are modified by the SPE so they must be transferred back to the benchmark when the SPE has finished. This outlines the communication that will take place prior and after the execution on the SPE.

While we were working on that task, we were also able to work in the CellBE environment to learn how to program with MPI as well as multiple C files and SPE kernels. We first investigated the cell messaging library (CLM) as a means of communicating with the SPEs. After an endless cycle of encountering errors and solving errors associated with compiling the CML libraries, we decided to cut our losses and communicate exclusively with DMA transfers. We all already had experience with this process from a previous homework assignment. 

This leaves us at the point where we are finishing up the port to the SPE and will begin debugging/testing afterwards. We will each be finishing this part shortly and then we can begin analyzing our results with all of the modified functions working together.