Performance Analysis Tools

Blaise Barney, Lawrence Livermore National Laboratory UCRL-MI-133316

NOTE: This information pertains to retired LC systems and is being kept for archival purposes only.

Table of Contents

  1. Abstract
  2. Scope and Motivation
  3. Performance Considerations and Strategies
  4. Timers
    1. time
    2. timex
    3. gettimeofday()
    4. MPI Timing Routines
    5. system_clock()
    6. read_real_time()
    7. IBM Fortran Routines - rtc, irtc, dtime, etime, mclock, timef
  5. Profilers
    1. prof
    2. gprof
    3. monitor
    4. top
    5. xprofiler
    6. mpiP
  6. Performance Analysis Tools
    1. HPM Toolkit
    2. PE Benchmarker Toolset
    3. VampirGuideView (VGV)
    4. Paraver and Dimemas
    5. Performance Toolbox
    6. Dynamic Probe Class Library (DPCL)
    7. Other Multi-Platform Parallel Performance Analysis Tools
  7. Miscellaneous Tools
    1. vmstat
    2. netstat
    3. iostat
    4. ps
  8. References and More Information
  9. Exercise


Abstract


An essential prerequisite for optimizing an application is to first understand its execution characteristics. A number of tools are available for the application developer to accomplish this, ranging from simple shell utilities, timers and profilers, trace analysis tools, to sophisticated full featured graphical toolsets. This tutorial investigates, in varying depths, a number of tools that can be used to analyze an application's performance towards the goals of optimization and trouble-shooting. A lab exercise featuring a subset of these tools is provided.

Level/Prerequisites: A basic understanding of parallel programming in C or Fortran is assumed.



Scope and Motivation


Scope of This Tutorial:

Motivation:

Performance Considerations and Strategies



Timers


time


timex

Note Note that much of the timex command's output is NOT described in the timex man page. Some of it may be understood from reading the sar command man page.


gettimeofday()


MPI Timing Routines


system_clock()


read_real_time()
read_wall_time()
time_base_to_time()


xlf Fortran Timing Routines

The routines described here are included as part of the IBM xlf compiler's service and utilities procedures. They will probably not be found in non-IBM Fortran environments.

rtc()

irtc()

dtime_()

etime_()

mclock()

timef()



Profilers

prof


gprof

Additional Notes About prof and gprof:


monitor


top


xprofiler

xprofiler xprofiler

Overview:

Example Displays and Reports:

Using xprofiler:

  1. Location:

  2. Compile and link your program with both of the options: -g -pg . The -g option enables source statement profiling and -pg turns profiling on.

    Note: when you compile and link separately, you must use the -pg option with both the compile and link commands.

  3. Run your serial or parallel code as usual. When it has completed, you will find one statistics file file for each task. Serial codes produce gmon.out. For parallel jobs, the files will be called gmon.out.0, gmon.out.1, gmon.out.2 and so on.

  4. Invoke xprofiler. This can be done several ways as shown below. Note: the examples below assume that you've set up an alias on BG/L and BG/P platforms since the command will not be in your default path, and the command differs by having an uppercase "X".

    xprofiler
    Starts without a file loaded. Must load file by using xprofiler's File pull-down menu.
    xprofiler myprog gmon.out
    Loads the serial program with it's stat file
    xprofiler myprog gmon.out.N 
    Loads parallel program with selected stat file
    xprofiler myprog gmon.out.* 
    Loads parallel program with combined/merged stat files

    Note that there are also several command line flags available to define certain xprofiler characteristics and behaviors.

  5. Use xprofiler's pull down menus and hidden menus (press right mouse button on an object such as an arc or function box) to accomplish desired actions, such as:
    • Zooming-in, out
    • Examining arc information
    • Examining function statistics
    • Producing reports
    • Loading new files
    • Setting configuration options
    • Saving/producing screen dumps
    • Unclustering functions from their library group
    • Collapsing/hide library information
    • and more....
    xprofiler hidden function menu
    xprofiler hidden function menu
    xprofiler hidden arc menu
    xprofiler hidden arc menu

    Important note: it is often necessary to uncluster functions and zoom-in to get to important detailed information. It is also usually useful to collapse/hide library information that isn't needed (like system libs).

Documentation Documentation:


mpiP

Overview: Using mpiP: Understanding mpiP Output:

Performance Analysis Tools

HPM Toolkit

Overview:

Using hpmcount:

Using libhpm: Using hpmviz:


PE Benchmarker Toolset

Overview: Using the Performance Collection Tool (PCT): Using the UTE Utilties: Using the Profile Visualization Tool (PVT): Documentation: (you're going to need this)


VampirGuideView (VGV)


Paraver and Dimemas


Performance Toolbox

Performance Toobox Performance Toobox

Overview:

Example Displays:


Dynamic Probe Class Library (DPCL)

Overview:

Using DPCL:

Documentation:


Other Multi-Platform Parallel Performance Analysis Tools:


Miscellaneous Tools


vmstat - Virtual Memory Statistics


netstat - Network Statistics


iostat - I/O Statistics


ps - Process Status

Several tools which fall into the "other" category are available for the SP environment. Note that some of these tools are installed at LLNL, under development and/or unsupported. Some may even be extinct.

mpi_trace:

MPIMap:

And More...


This completes the tutorial.

Evaluation Form       Please complete the online evaluation form - unless you are doing the exercise, in which case please complete it at the end of the exercise.

Where would you like to go now?



References and More Information