The simulator can automatically capture system-wide performance statistics that are useful in determining the sources of performance degradation, such as channel stalls and instruction-scheduling problems.
You can also use SPE performance profile checkpoints to delimit a specific region of code over which performance statistics are to be gathered.
Performance profile checkpoints (such as prof_clear , prof_start and prof_stop in the code samples below) can be used to capture higher-level statistics such as the total number of instructions, the number of instructions other than no-op instructions, and the total number of cycles executed by the profiled code segment.
The checkpoints are special no-op instructions that indicate to the simulator that some special action should be performed. No-op instructions are used because they allow the same program to be executed on real hardware. A SPE header file, profile.h , provides a convenient function-call-like interface to invoke these instructions.
In addition to displaying performance information, certain performance profile checkpoints can control the statistics-gathering functions of the SPU.
#include <profile.h> . . . prof_clear(); // clear performance counter prof_start(); // start recording performance statistics … <code_to_be_profiled> … prof_stop(); // stop recording performance statistics
SPUn: CPm, xxxxx(yyyyy), zzzzzzzwhere:
// file tpa2_spu.c #include <sim_printf.h> #include <profile.h> ... prof_clear(); prof_start(); for( i=0; i<spe_num*3; i++ ) sim_printf("SPE#: %lld, Count: %d\n", spe_num, i); prof_stop();
SPU2: CP0, 863(740), 17800 clear performance info. SPU2: CP30, 0(0), 1 start recording performance info. SPE#: 25296904, Count: 0 SPE#: 25296904, Count: 1 SPE#: 25296904, Count: 2 SPE#: 25296904, Count: 3 SPE#: 25296904, Count: 4 SPE#: 25296904, Count: 5 SPU2: CP31, 118(103), 400 stop recording performance info.