The goal of our project is to assess the benefits of hardware acceleration using CellBE in an MPI environment. In order to create such hybrid environment, we need to use Cell Messaging Layer, a communication library for the Cell Broadband Engine, which many people recognize as the Playstation 3's microprocessor. To evaluate the benefits of hardware acceleration, we chose a large benchmark AMG, an Algebraic Mult-Grid linear system solver for unstructured mesh physics packages. This benchmark is written in C and it is only developed to evaluate the parallelism of systems that use MPI and/or OpenMP. Our goal is to incorporate the AMG benchmark onto a CellBE/MPI hybrid environment. First, we need to profile the benchmark's performance to identify the main hotspot(s) in terms of performance using mpiP and gprof. Then, we must recode the hotspot(s) as a kernel on an accelerator using Cell Messaging Layer libraries in addition the Cell SDK. Then add DMA-based data movement. Finally, we compare the performance before and after for different number of nodes. The challenges that we might have for this project will be: • Understand the AMG benchmark • Understand the Cell Messaging Layer • Identify the main hotspot(s) in terms of performance in computation and communication • Corporate the benchmark main hotspot(s) as a kernel • Compare the performance before and after the integration • Optimize the benchmark's main hotspot(s) for the accelerators [TOP] APPROACHING OUTLINE In order to develop a solution using the benefits of hardware acceleration for ourTASK DESCRIPTION T. 1: Understand the problem in the AMG benchmark T. 2: Successfully compile the benchmark for henry2 system T. 3: Profile the benchmark for different input sizes and number of nodes to determine the hotspots T. 4: Study the source code of the function in the hotspots T. 5: Prepare a plan to split the workload if these functions into available nodes a. Decide what data we need to share b. What data is private to each node c. How to split the loops and conditionals T. 6: Get familiar with the Cell Messaging layer T. 7: Start coding (MPI Calls, DMAs, Calculations) T. 8: Start testing for new hotspots (communication lagging, bottlenecks and imbalances) T. 9: Collect data for final Implementation T. 10: Compare this with unmodified data and the data from initial implementation T. 11: Documentation/Webpage maintenance / Finalize and prepare a report PROJECT GANTT CHART [TOP] REFERENCES AMG benchmark page https://asc.llnl.gov/sequoia/benchmarks/#amg OpenMP and Cell http://portal.acm.org/citation.cfm?id=1462816 Cell in Scientific Computing http://portal.acm.org/citation.cfm?id=1128027&dl=GUIDE&coll=GUIDE&CFID=58387821&CFTOKEN=88238559 Cell Wiki Page http://en.wikipedia.org/wiki/Cell_%28microprocessor%29 Cell BE documentation page http://www.ibm.com/developerworks/power/cell/documents.html?S_TACT=105AGX16&S_CMP=LP A parallel algorithm for algebraic multigrid - paper http://delivery.acm.org/10.1145/990000/986604/p285-zhao.pdf?key1=986604&key2=2089756521&coll=GUIDE&dl=GUIDE&CFID=58396150&CFTOKEN=52196232 Cell Messaging Layer http://www.ccs3.lanl.gov/~pakin/software/cellmessaging/ HW4 Project Page http://courses.ncsu.edu/csc548/lec/001/hw/hw4/hw4.html A.Kejariwal and C. Cascaval. Parallelization Spectroscopy: Analysis of Thread-level Parallelism in HPC Programs. In The 2nd Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures (PESPMA) 2009, pages 30-39, June 2009. [TOP] |