Please use the ARC cluster for this assignment. All programs have to be written in C, translated with mpicc/gcc and turned in with a corresponding Makefile.
There are several methods to measure Pi which is a constant that is used to calculate the circumference of a circle from that circle's radius or diameter. One of these methods is the Leibniz formula for Pi:
Follow the lecture / obtain the corresponding code and write the CUDA code by replacing the integration method with the Leibniz formula (Leibniz_pi.c). More information about the Leibniz formula for Pi.
nvcc p2.cu -o p2 -O3 -lm -Wno-deprecated-gpu-targets
Turn in p2.cu (CUDA file), p2.Makefile and p2.README (Explaining the implementation).
We will extend the methods of the last HW into two dimensions.
Download, extract, compile the code lake.tar
This program models the surface of a lake, where some pebbles have been thrown onto the surface. The program works as follows. In the spatial domain, a centralized finite difference is used to inform a zone of how to update itself using the information from its neighbors
The time domain does something similarly, but here using information from the previous two times
The program runs two versions of the algorithm, a CPU version, and a skeleton GPU version. Your
task is to fill in the GPU algorithm to solve the same problem.
Instructions:
V0:
srun -N1 -n1 --pty /bin/bash
./lake {npoints} {npebbles} {end_time} {nthreads}npoints defines the grid size (npoints x npoints), npebbles is the number of pebbles that are generated in the program, end_time is the final time of the simulation, and nthreads will be used withe the GPU implementation.
The following runs on a grid of (128 x 128), with 5 pebbles, for 1.0 seconds, using 8 GPU threads (implemented later):
./lake 128 5 1.0 8 Running ./lake with (128 x 128) grid, until 1.000000, with 8 threads CPU took 0.284713 seconds GPU computation: 0.003168 msec GPU took 0.327409 seconds
The output files
lake_i.dat lake_f.datcan be converted into a .png image using the gnuplot script heatmap.gnu. Run
gnuplot heatmap.gnu
This will create the files lake_i.png(the initial configuration), lake_f.png (the final configuration) in the directory.
un[idx] = 2*uc[idx] - uo[idx] + VSQR *(dt * dt) *(( WEST + EAST + NORTH + SOUTH - 4 * uc[idx])/(h * h) + f(pebbles[idx],t));
un[idx] = 2*uc[idx] - uo[idx] + VSQR *(dt * dt) * (( WEST + EAST + NORTH + SOUTH + 0.25*(NORTHWEST + NORTHEAST + SOUTHWEST + SOUTHEAST) + 0.125*(WESTWEST + EASTEAST + NORTHNORTH + SOUTHSOUTH) - 6 * uc[idx])/(h * h) + f(pebbles[idx],t));
The program takes as an argument nthreads. This will be the number of threads per block used on the GPU. So, for instance, with nthreads=8, and a domain of grid points (npoints=128 x 128), you will create (npoints/nthreads)x(npoints/nthreads) = (16 x 16) blocks, with (8 x 8) threads on each block.
srun -N4 -n4 --pty /bin/bash
lake_f_0.dat //node 0 lake_f_1.dat //node 1 //etc.
Include in p3.README a discussion of your results. Your discussion should include answering the following questions:
Hints:
double *u_i0; //u^0 double *u_i1; //u^1These are passed to both the run_cpu and run_gpu routines; both routines should produce the same results.
Turn in p3.README, lake.cu/lakegpu.cu (version V2), lake_mpi.cu/lakegpu_mpi.cu (version V3), p3.Makefile
Single Author info:
username FirstName MiddleInitial LastName
Group info:
username FirstName MiddleInitial LastName
username FirstName MiddleInitial LastName
username FirstName MiddleInitial LastName