Homework 1

Deadline: Sept. 5, 2012
Assignments: All parts are to be solved individually (turned in electronically, written parts in ASCII text, NO Word, postscript, etc. permitted unless explicitly stated).

Please use the ARC cluster (Linux). All programs have to be written in C, translated with mpicc/gcc and turned in with a corresponding Makefile.

  1. (0 points) Learn how to compile and execute an MPI program.

    Hints:

    Nothing to turn in, this is just a warm-up exercise.

  2. (50 points) Write an MPI program that determines the point-to-point message latency for pairs of nodes. Exchange point-to-point message with varying message sizes (8B, 16B, 32B, ... , 1M). Your program should run continuously (i.e., do not re-run for each message size - your program should iterate through 20 different message sizes).

    In a README file, explain your plots. Do some nodes take longer than others? In particicular, could your results be indicative of the underlying network configuration? Explain. Also discuss message size as it relates to latency; are there any odd data points in your graphs?

    Hints:

    Example output: read as follows [message size] [node_1_avg node_1_min node_1_max] [node_2_avg node_2_min node_2_max] ...

    8 8.276531e-06 5.960464e-06 1.788139e-05 5.517687e-06 3.814697e-06 1.096725e-05 5.994524e-06 5.006790e-06 1.096725e-05 6.267003e-06 5.006790e-06 1.192093e-05 5.858285e-06 5.006790e-06 1.096725e-05 5.994524e-06 5.006790e-06 1.192093e-05 6.709780e-06 4.768372e-06 1.215935e-05
    

    Example output: example output graph here

    Turn in the files rtt.c, rtt.out, rrt8.png, rtt16.png, and rtt.README
    (rtt8.png is the graph using 8 nodes, rtt16.png is the graph using 16 nodes).

  3. (50 points) Group problem (up to groups of 3)
    Implement the function analysis code in a parallel scheme:

    Download the code for this problem: hw1.c

    This code implements the serial verison of the numerical techniques described below. You will update this code to run in parallel.

    Given some function, the code will numerically compute derivatives at all grid points, integral of the function over the grid points, and the error and standard deviation of the error of the numerical computation.

    IMPORTANT: You MUST supply a function that is continuous and smooth - that is, that has an analyitic first derivative. Examples include:

    • x^2
    • sqrt(x)
    • sin(2 * x)
    For a list of functions, their derivatives and integrals, see here

    IMPORTANT: Be sure to choose a function that doesn't behave weird between the user-defined grid values XI and XF. That is, do not do illegal math operations like fn(x) = 1/x on the range with a grid value x = 0.0, or fn(x) = sqrt(x) with a grid value x = -1.0.

    Compile the code with:

    gcc -g -o hw1 hw1.c -lm
    

    The program will output a file err.dat which gives:

    Edit the program to include more grid points, different functions, ect.

    Data Decomposition

    A common approach to implementing a parallel program is data decomposition. The data the program will run with is split up into chunks and a copy of the program runs using each individual chunk.

    Consider the array of N grid points x_1 to x_n:

    For our program, we will split this grid into evenly spaced "slices" by processor rank:

    IMPORTANT: Be sure that the number of grid points is evenly divisable by the number of processors, or make sure your final code handles the case where the number of processors does not evenly divide the number of grid zones.

    Computing derivatives: Finite Differencing

    If is often the case that we cannot analytically calculate the derivate of some data. However, we can approximate it using a finite differencing scheme.

    The method comes from calculus. Recall the definition of a derivative of the function f(x):

    We can assert, with reasonable justification, that if dx is small enough, a good approximation to the derivative is given by:

    This is a good approximation, but a better one is to use the symmetric form of the above:

    In our program, let xc[i] be our grid points, and let yc[i] be the value of our function at grid point xc[i]. Using the above, we can approximate the derivative at the grid point xc[i] as:

    dydx[i] = (yc[i+1] - yc[i-1]) / (2.0 * dx)
    

    We should notice two things: (i.) We need to know the boundary conditions of our full grid domain - that is, what the value of the function will be at the left and right edge of the grid, and (ii.) For our decomposed grid, we will need to communicate the boundary values of our function to the processors to the "left" and "right". The diagram below represents one side of this process. The process of rank i will recieve the value fn(x_4) (fn(x) represents the value of our function at point x) from the process of rank i+1 send the value of the function fn(x_3) to the process of rank i+1

    Computing integrals: Trapazoidal Rule

    Similar to derivatives, integrals of data values must usually be carried out numerically. Here we will use a simple method to approximate our integral using the trapazoidal rule

    Recall that the integral of a function from x_1 to x_n can be written:

    That is, we can compute one integral for each pair of grid points x_i and x_i+1, then add them together at the end.

    To compute each integral, recall that the integral of a function can be described as the area under the curve. Like with the finite difference, if the distance between two grid points is small, we can make a good approximation the integral between two points by assuming the function is a straight line between the two points. Then, we just have to compute the area of a trapazoid:

    Actually, we are computing the area of a rectangle where the center of the top edge coincides with the center of the interval we are integrating over, and the height of the rectangle is the average value of the function evaluated at the endpoints. In fact, this is equivalent to a trapazoid, but calcultating the area of a rectangle is easier. Grapically, the procedure looks like this:

    The green area is our approximate value to the area under the curve, that is, the integral. Notice that it is has some gaps, and the biggest gaps occur where the curve is very steep.

    For our parallel implementation, we again need to know the values across domain boundaries. Notice that we have already exchanged this information when calculating the derivatives, so we can simply use it again here. Also notice that, if there are N grid points on the domain, we calculate N - 1 integrals.

    Error analysis: Mean and standard deviation

    By now we know how to calculate good approximations of derivatives and integrals. However, we should recall that these are approximations, and the approximations we've been using are only good, not great. Also take note that these approximations only work with a specific class of functions that fall within narrow parameters. You can think about what happens to the finite difference method if the function is defined piece-wise (instead of continuously), or a function that varies rapidly (e.g. f(x) = sin(1000 * x))

    Like with most numerical methods, determining error is important. There are many procedures to do so, and we will look at a simple but powerful error indicator: standard deviation

    Standard deviation is simply a measure of how much all data points are spread out from their "expected" values. A small standard deviation means the data is "good", while a large standard deviation means the data is, maybe not necesarrily poor, but inconsistant.

    We will look at the standard deviation of the relative error for the derivatives:

    Note we will need the actual values of the derivatives here, so our function must have an exact expression for the derivative.
    IMPORTANT: Be careful not to divide by zero here; choose a function whos derivative does not evaluate to zero on any of the grid points.

    To find standard deviation, we must first calculate the mean (or average) error of the entire domain. So first we calculate the error at each grid point.

    The standard deviation is then given as:

    Where error_i is the error in the derivative at the point x_i

    The calculation then involves three steps: (i.) Calculate the average error for the local domain, (ii.) Communicate the local average error to a single process, and calculate the average error for the entire domain, (iii.) use the global average error to calculate the standard deviation.

    Problem procedure

    Using the provided serial code, the assignment is as follows:

    Some guidelines:

    Compare the performance (using MPI_Wtime) for long-running inputs (large number of grid points) for each calculation (finite difference, integral, error), and each communication method (blocking/non-blocking point-to-point, single call/manual reduction) with submitted jobs (to ensure low contention). Show your results and comment on the outcome in the README file. Comment on how the errors change for i.) the number of grid points and ii.) the function used. Example output of err.dat here.
    Example output of hw1_mpi.out here.
    NOTE:The above outputs were created using the function fn(x) = sqrt(x)

    Turn in the files hw1_mpi.c, err.dat, hw1_mpi.out, README

  4. Peer evaluation: Each group members has to submit a peer evaluation form.

What to turn in for programming assignments:

How to turn in:

Use the "Submit Homework" link on the course web page. Please upload all files individually (no zip/tar balls).

Remember: If you submit a file for the second time, it will overwrite the original file.

Additional references: