HW 1

Homework 1

Deadline: see web page
Assignments: All parts are to be solved individually (turned in electronically, written parts in ASCII text, NO Word, postscript, etc. permitted unless explicitly stated).

Please use the henry2 cluster (Linux). All programs have to be written in C, translated with mpicc/gcc and turned in with a corresponding Makefile.

(0 points) Learn how to compile and execute an MPI program.
- Log in to a 32-bit login node of henry2:
```
ssh login.hpc.ncsu.edu -l <your-unity-username›
```
  Use your unity passwd.
  Notice: To login to the 64-bit login node, you would need to type:
```
ssh login64.hpc.ncsu.edu -l <your-unity-username›
```
- Choose one of the three compilers: Gnu / Intel / Portland:
```
add gnu
add gnu_64      (for the 64-bit version)
add intel
add pgi
```
- Write a simple MPI program, such as for calculating Pi (see class notes or browse the web).
  --- Notice: This is one of the few times you are allowed to use program code from the web for a homework. See class policies.
- Compile the program:
```
mpicc -g -o pi pi.c
```
- Execute the program on 2 processors:
```
mpirun -np 2 pi
```
  Try again with a different number of processors.
- Create a job script pi.bsub (using mpiexec instead of mpirun) and submit a batch job for 2, 4, 8, ... processors with the LSF command
```
bsub < pi.bsub
```
  Notice: "Because the job asks for 4 or fewer processors and less than 15 minutes of time, it goes into the high priority debug queue, so that turnaround is fast and mistakes can be quickly corrected. "
- Monitor the job's progress with
```
bjobs
```
  Enter the command repeatedly until the job is done. Then, inspect the output/error files.
- If you ever want to kill a job, issue
```
bkill 
```
  where is the jobid is obtained from bjobs.
  Other useful LSF commands: bpeek (see output of running job), bhist, bqueues, bhosts, bmod, bbot/btop, bswitch, bstop/bresume, bkill (see LSF for Users for details).
  Other useful bsub options: -R "span[ptile=1]" (run 1 task per node).
- Enhance your program with printf() statements and submit another job. Check the output file for the printf output. The printf() debugging technique is your best friend for batch jobs.
Notice: You have very limited disk space in your home directory on henry2. However, there is more disk space at /gpfs_share/csc548. Utilize it wisely as it is shared between all 548 students.
Hints:
- Some Useful LSF Commands
- MPI API
- MPICH
- A User's Guide to MPI by Peter Pacheco
- MPI Quick Reference by LAM MPI
- Debugging: Gdb does not work with MPI, but totalview or pgdbg (the latter only for Portland Group compiled binaries) could be used with some training.
Nothing to turn in, this is just a warm-up exercise.
(50 points) Write an MPI program that determines the point-to-point message latency for pairs of nodes. You should exchange point-to-point messages with short message volume (less than 1KB) between any two nodes and time the round-trip time (rtt). Also report min/max times. The result/output should be a three matrices with node names (rows/columns) and min/max/rtt values in microseconds. Matrices are preceded by their respective description: min/max/rtt (in a single line). Report numbers for at least 16 different nodes. (You may try larger values if you can get your job through the queues.)

In a README file, try to explain different values in the matrices in reference to the possible network configuration of nodes on the cluster.
Hints:
- man PMPI_Get_processor_name
- man gettimeofday
- Use exclusive execution and 1 processor per node resource bsub options:
```
#BSUB -n 16
#BSUB -x
#BSUB -R "span[ptile=1]"
```
- Average the rtt over 8 exchanges (skipping the first exchange) -- why?
- Ensure that only two nodes are exchanging messages at any time -- why?
  (You could also compare with results not observing this hint.)
- A good message volume is your rank + your hostname (also for debugging).
- You may leave the diagonal zero in the matrices.
- Sample output:
```
AVG:
            blade30-5 blade13-10  blade11-7 blade26-10  blade12-2 blade27-11  blade32-8  blade11-1  blade28-3  blade26-5 blade30-11 blade13-14  blade35-5  blade35-3 blade35-13 blade35-12
 blade30-5          0        165        162        158        163        151        153        163        149        154        101        176        151        149        148        163
...
```
Turn in the files rtt.c, Makefile.rtt, rtt.out, rtt.bsub and rtt.README.
(50 points) Implement the Pi approximation algorithms in three different ways: (c) with collective communication (Broadcast/Reduce, see lecture nodes), (b) with blocking point-to-point communication (Send/Receive) and (n) with nonblocking communication (Isend/Irecv/Waitall/Wait). Options (b) and (n) should have two variants: (r) rooted centralized approach (communicate with rank zero) and (t) tree-based approach (manually create a binary reduction tree rooted in rank zero and communicate along the edges to simulate the broadcast and reduction).
Compare the performance (using MPI_Wtime) for long-running inputs (large number of intervals) for each approach with submitted jobs (to ensure low contention). Show your results and comment on the outcome in the README file.

Turn in the files pic/pibr/pibt/pinr/pint.c, Makefile.pi, and pi.README.

Hints:

Omit the following job options (why?):
```
#BSUB -x
#BSUB -R "span[ptile=1]"
```

What to turn in for programming assignments:

commented program(s) as source code, comments count 15% of the points (see class policy on guidelines on comments)
Makefiles (if required)
test programs as source (and input files, if required)
README (documentation to outline solution and list commands to install/execute)
in each file, include the following information as a comment at the top of the file, where "username" is your unity login name and the single author is the person who wrote this file:

Single Author info:

username FirstName MiddleInitial LastName

How to turn in:

Use the "Submit Homework" link on the course web page. Please upload all files individually (no zip/tar balls).

Remember: If you submit a file for the second time, it will overwrite the original file.

Additional references:

Cliff Notes for GDB - the very bare bones of GDB
(by Dr. McKinley of University of Massachusetts)
Gnu manuals (make, gdb and many more)