High-performance Computing on NVIDIA GPUs

Dr. Lars Nyland, Compute Architect, NVIDIA Corporation

Abstract: In this talk, I will cover three topics: 1) The NVIDIA GPU computing architecture, 2) The CUDA programming language, and 3) and recent work on N-Body simulation. The GPU architecture supports both graphics and non-graphics computation, using an array of custom processors on a single chip. The programming model is neither SIMD nor MIMD, but somewhere in between, where we can exploit the advantages of each. The current performance part has 240 processors running at 1.5 GHz. With dual-issue capabilities, this places the achieved peak performance just under 1 TFLOP. CUDA is NVIDIA's C/C++ programming language for programming the GPU. It has a few extensions that include thread launch/terminate, synchronization, data sharing, and atomic operations. I'll discuss a collaborative effort with Jan Prins (UNC-CH) and Mark Harris (NVIDIA), where we have written an N-Body simulator using CUDA that runs on NVIDIA GPUs. We achieve a sustained computational rate over 400 GFLOPS. I'll finish with a few demonstration applications, as well as a discussion of how other groups are using NVIDIA GPUs to accelerate their computations. As a postscript, I'll mention the "professor partnership program" where academicians can receive GPU computing hardware at no cost.

Bio: Lars Nyland is a senior architect in the Compute group in NVIDIA's Durham, NC office. He designs, develops and tests architectural features that enable HPC on GPUs. Prior to joining NVIDIA in 2005, he was an associate professor of Computer Science at the Colorado School of Mines. Prior to that, he was a research associate professor at UNC Chapel Hill from 1991 until 2003. He received his Ph.D. from Duke University in 1991 under the supervision of professor John Reif studying parallel programming techniques. In 2000, he worked half-time jumpstarting Deltasphere, Inc., a company that builds scene digitizers (primarily for forensic applications).