2006-2007 Computer Science Seminar

This talk is part of the System Research Seminar series

Date: Monday May 21, 2007
Time: 10:00 AM
Place: 3211, EB2; NCSU Centennial Campus (click for courtesy parking request)

Speaker: Martin Schulz , Lawrence Livermore National Laboratory

Developing New Tool Strategies for Scalable HPC Systems

Abstract: Current high-end HPC cluster systems are starting to scale beyond the 10,000 processor mark and some systems, like Blue Gene/L, have already reached over 130,000. This trend towards higher processor counts will continue and lead to Peta-scale systems in the foreseeable future.

The increasing number of CPUs does not only have an impact on applications, but also on the development environment around them. In particular, tools need to be able to keep up with the applications' scalability. This includes collecting and analyzing data, finding the relevant information, and presenting it to the user in a comprehensible way.

In this talk I will show how we address these issues within the DOE/ASC (Advanced Simulation and Computing) program. In particular, I will focus on two tool sets we recently developed: STAT helps users gather and analyze distributed stack traces to quickly focus debugging efforts; and P^nMPI enables users to dynamically compose MPI tool chains from existing components as well as customize their scope. Both tool sets give us new insights into scalable systems and are part of a new generation of tools capable of providing efficient support for next generation HPC machines.

Short Bio: Martin Schulz is a Computer Scientist at the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory (LLNL). His research interests include parallel and distributed architectures and applications; performance monitoring, modeling, and analysis; memory system optimization; parallel programming paradigms; tool support for parallel programming; and fault tolerance at the application and system level.

Martin earned his doctorate in Computer Science in 2001 from the Technische Universitaet Muenchen (Munich, Germany). He also holds a Master of Science in Computer Science from the University of Illinois at Urbana Champaign. After completing his graduate studies and a postdoctoral appointment in Munich, he worked for two years as a Research Associate at Cornell University, before joining LLNL in 2004. Currently, his projects include the design and development of performance tools for large-scale parallel systems, in particular Open|SpeedShop, work on application and communication optimizations for BlueGene/L, the use of machine learning techniques for performance analysis and modeling techniques, as well as scalable debugging. He is a member of the ACM and the IEEE Computer Society.

Host: Frank Mueller, Computer Science, NCSU

Media Files:

No media files available at this time

Streaming Video:

No streaming video available at this time

Back to Seminar Listings

Back to Colloquia Home Page

go to top