************************************************************************
* 		     Cluster Computing - CSC 591c                      *
*			      Homework 4                               *
*								       *
*	        Anwar Ali, Annika Edwards, Nhon Nguyen                 *
*			    April 10, 2003	                       *
************************************************************************
*                                                                      *
*    Evaluation and Extension of MPIP: Lightweight, Scalable MPI       *
*    Profiling Tool by using The IRS Benchmark Code (IRS)              *
************************************************************************


Introduction
------------

Parallel programs are designed so that a speedup comparable to the 
number of processors that are available can be achieved.  According to 
Amdal's Law this speed up is limited by the fraction of the program that
is sequential.  In addition to this law, the time spent communicating 
data can also limit speedup. To increase speedup the communication time
must be reduced.  In order for a programmer to do this, a tool is needed
that can point to where a parallel program is spending a lot of time 
communicating.

Jeffrey S. Vetter of Lawrence Livermore National Laboratory has produced
such a tool for programs paralelized using MPI. "MPIP is a lightweight 
profiling library for MPI applications." It shows how much time is being
spent executing MPI calls.  Data is collected for each process and for 
each call site and aggregated into one output file.  MPIP does not add
considerable execution time to a program.

 
Problem Statement
-----------------

In this project we seek to understand fully how MPIP is used, and how 
the output can help us to determine where communication time should be 
reduced. For example, if a large amount of time is being spent on an
MPI_BARRIR call in several processes, this may indicate a load imbalance. 
Once gaining an understanding we need to determine how MPIP can be 
improved and impliment this  improvement.


We will use the IRS Benchmark Code (IRS) which executes on both SMP 
and multi-node systems, to measure and compare the performace of a large
application on the cluster. This will help us understand more about large
scale applications on SMP machines and parallel architectures.


Outline
-------

- Install and configure MPIP 

- Deloy the IRS Benchmark Code into the cluster.

- We will then run MPIP on the IRS benchmark to determine where the 
  communication problems are, if any, and what the causes could be. We will 
  then attempt to fix any problems we find.

- Running the Code for different cases
	* Sequential
	* Threads,
	* MPI Parallel
	* MPI Parallel and Threads Parallel 

- Collect and analise data to learn how the application performs across 
  all the above scnarios. 

- Make futher study for any improvement of the IRS and MPIP. This will  take
  the majorty of our time.  Once we determine what can be done, we will 
  impliment our solution.


References
----------
mpiP: Lightweight, Scalable MPI Profiling
http://www.llnl.gov/CASC/mpip/

The IRS Benchmark Code
http://www.llnl.gov/asci/purple/benchmarks/limited/irs/

Parallel Implicit Solvers for Radiation Transport Systems
http://research.nianet.org/~dimitri/ASCI/

Statistical Scalability Analysis of Communication Operations in Distributed 
Applications, Jeffrey S. Vetter, Michael O. McCracken,Proc.
ACM SIGPLAN Symp. Principles and Practice of Parallel Programming(PPOPP, 2001)
URL:  http://llnl.gov/CASC/people/vetter/people/pubs/ppopp01_scal_analysis.pdf

An Empiracal Performance Evaluation of Scalable Scientific Applications,
Jeffrey S. Vetter,Andy Yoo,Supercomputing Conf. Tech Paper(2002)
URL: http://sc-2002.org/paperpdfs/pap.pap222.pdf


Project Web Page 
----------------
http://www4.ncsu.edu/~aredward/csc591c/index.html