NCSU CSC548 Parallel Computer Project 4

Introduction

This is a course project for NCSU CSC548 Parallel Computer . The purpose of this project is to add support for MPI I/O (MPI_File_xxx functions) to record framework and test with a small (PI program) and a large application (Parallel I/O benchmark), execute the implementation in a cluster of parallel computers environment. Then evaluate any performance impact of the added technique on Parallel I/O benchmark applications.

Problem Description The record framework's purpose is to capture and record a compressed trace of all MPI communication performed by an MPI application for lossless replay. The record application provides hooks into every MPI call regardless of MPI implementation.

MPI I/O is new MPI standard that defines a set of routines for transferring data to and from external storage. It offers a number of advantages over traditional language I/O:

Flexibility - MPI I/O provides mechanisms for collective access (many processes collectively read and write to a single file), asynchronous I/O, and strided access.
Portability - Many platforms support the MPI I/O interface, so programs should compile and run essentially unchanged.
Interoperability - Files written by MPI I/O are portable between platforms.

The current version of record framework does not have support for MPI I/O which is relative new standard. A goal of this project is to expand record frameworks capability by adding support for MPI I/O routines.

Project Outline

This is relative new project topic for me and I am unfamiliar to benchmark application, I plan following steps to prepare myself to accomplish the project implementation and benchmark evaluation:

Study relevant project material online
Understand usage of benchmark application through both online material and hand-on practice skills with small application like MMUL and PI.
Implement MPI I/O on record tool and test on Parallel I/O benchmarks

Execute on small application first to ensure implementation correctness
Test on large Parallel I/O benchmark applications
Conduct and collect empirical benchmark data on Parallel I/O benchmarks before and after MPI I/O implementation

The evaluation will be based on empirical data collected like described above. The experiment will be conducted on a cluster of sixteen parallel computers. Each machine is AMD Athlon XP 1900+ dual-core processors machine with 64kB L1 I/D-split caches and a 256kB L2 unified cache.

Plan of Work

Week One (10/30/06-11/04/06)

Understand problem, learn architecture of record framework and semantics and usage of MPI I/O routines
Install MPI Trace Compression source code (/home/student/secret/record.tgz) on OSxx machine.
Download and run record on unmodified Parallel I/O benchmarks to obtain baseline data that will be compared with data of record with MPI I/O support

Week Two (11/05/06-11/11/06)

Submit progress report (Homework 5)
Implement MPI I/O on small application PI; execute modified PI on cluster to verify correctness

Week Three (11/12/06-11/18/06)

Implement MPI I/O on Parallelg I/O benchmark applications
Debug MPI I/O implementation

Week Four (11/19/06-11/25/06)

Test/Execute Parallell I/O benchmarks on cluster with MPI I/O capable record tool
Collect Parallel I/O benchmark data
Generate table/chart/graph based on collected data

Week Five (11/26/06-11/27/06)

Complete final project report
Rerun benchmarks if necessary

References

M. Noeth, F. Mueller, M. Schulz, B. de Supinski, Scalable Compression and Replay of Communication Traces in Massively Parallel Envrionments, submitted
:F. Mueller, MPI I/O Trace Compression presentation, 2006
README, record.tgz
Introduction to MPI I/O, http://www.nersc.gov/nusers/resources/software/libs/io/mpiio.php#concepts