SPE performance profile checkpoints

The simulator can automatically capture system-wide performance statistics that are useful in determining the sources of performance degradation, such as channel stalls and instruction-scheduling problems.

You can also use SPE performance profile checkpoints to delimit a specific region of code over which performance statistics are to be gathered.

Performance profile checkpoints (such as prof_clear , prof_start and prof_stop in the code samples below) can be used to capture higher-level statistics such as the total number of instructions, the number of instructions other than no-op instructions, and the total number of cycles executed by the profiled code segment.

The checkpoints are special no-op instructions that indicate to the simulator that some special action should be performed. No-op instructions are used because they allow the same program to be executed on real hardware. A SPE header file, profile.h , provides a convenient function-call-like interface to invoke these instructions.

In addition to displaying performance information, certain performance profile checkpoints can control the statistics-gathering functions of the SPU.

For example, profile checkpoints can be used to capture the total cycle count on a specific SPE. The resulting statistic can then be used to further guide the tuning of an algorithm or structure of the SPE. The following example illustrates the profile-checkpoint code that can be added to an SPE program in order to clear, start, and stop a performance counter:
	#include <profile.h>
	. . .
	prof_clear();     // clear performance counter
	prof_start();     // start recording performance statistics
	…
		<code_to_be_profiled>
	…
	prof_stop();     // stop recording performance statistics
When a profile checkpoint is encountered in the code, an instruction is issued to the simulator, causing the simulator to print data identifying the calling SPE and the associated timing event. The data is displayed on the simulator control window in the following format:
SPUn: CPm, xxxxx(yyyyy), zzzzzzz
where:
The following example uses the tpa1_spu program and instruments the loop with the prof_clear , prof_start and prof_stop profile checkpoints. The relevant code is shown here.
// file tpa2_spu.c

#include <sim_printf.h>
#include <profile.h>

	...

	prof_clear();
	prof_start();	
	for( i=0; i<spe_num*3; i++ )
		sim_printf("SPE#: %lld, Count: %d\n", spe_num, i);
	prof_stop();
Figure 1 shows the output produced by the program.
Figure 1. Profile checkpoint output for SPE 2
SPU2: CP0, 863(740), 17800
clear performance info.
SPU2: CP30, 0(0), 1
start recording performance info.
SPE#: 25296904, Count: 0
SPE#: 25296904, Count: 1
SPE#: 25296904, Count: 2
SPE#: 25296904, Count: 3
SPE#: 25296904, Count: 4
SPE#: 25296904, Count: 5
SPU2: CP31, 118(103), 400
stop recording performance info.