CSC791A Group 5 Project Home

People

Project Goal

For our final project, we plan to modify Stanford's SUIF compiler framework to generate parallel code using MPI, the de-facto standard for distributed-memory multiprocessing. While SUIF's optimizations are currently designed for shared-memory multiprocessors, we would like to show that they can be retargeted for distributed memory while maintaining acceptable performance.

Progress Report 1 Summary (Full PDF)

In order to achieve our project goal, we plan on selecting a simple tiling example, either from [1] or [2], and studying it in both its optimized and unoptimized forms. After becoming familiar with SUIF's runtime parallel library and how well it maps to MPI, we will attempt to get the aforementioned example working as a SUIF Pass with a modified runtime library. If we find that the MPI version is inefficient due to fine-grained parallelism, we will attempt to modify SUIF's analysis passes to detect coarser-grained parallelism and compare performance.

Progress Report 2 Summary (Full PDF)

Based on feedback from Dr. Mueller, our goals for this phase of the project were to:

For the first goal, we looked into the NAS Parallel Benchmarks [4, 5]. For most of the NAS benchmarks, OpenMP, MPI, and serial code are provided, which made it well-suited for our purposes. Among the benchmarks, MG (a multigrid solver) was selected to work from. It contains four key routines that each perform some computation, then synchronize data values between the memories of different processors with communication routines.

We also managed to successfully compile and test SUIF2 on Red Hat Enterprise Linux 3, though this was certainly nontrivial. SUIF has not been actively developed for five years, so we had to modify our build environment so that it would compile. In particular, we needed to downgrade to gcc 2.9.6, install several nonstandard libraries, and modify portions of SUIF and the Omega dependence analysis library. After building SUIF, we successfully compiled a nontrivial C file into SUIF code.

Finally, to prepare for the next major phase of the project, we studied the documentation for SUIF and learned how to write a pass so that we can implement our modifications.

In the future, we need to learn more about SUIF's dependence analysis framework, write simpler versions of the serial, MPI, and OpenMP versions of MG in C, and begin implementing our SUIF pass.

References

  1. S. Hiranandani, K. Kennedy, and C. Tseng, Compiling Fortran D for MIMD Distributed-Memory Machines, Communications of the ACM, vol. 35, no. 8, pp. 67-80, Aug. 1992. (PDF)
  2. G. Goumas, N. Drosinos, M. Athanasaki, N. Koziris. Automatic Parallel Code Generation for Tiled Nested Loops. Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 1412-1419. ACM Press, 2004. (PDF)
  3. M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, S.W. Liao, E. Bugnion and M. S. Lam. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, December 1996. (PDF)
  4. D. H. Bailey, E. Barszez, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederiekson, T. A. Lasinski, R. S. Schreiber, tI. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga, "The NAS Parallel Benchmarks", Intl. Journal of Supercomputer Applications, v. 5, no. 3 (Fall 1991), pp. 63- 73. (PDF)
  5. D. Bailey, T. Harris, W. Sahpir, and R. van der Wijngaart, The NAS Parallel Benchmarks 2.0, Report NAS-95-020, Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, December 1995. (PDF)