Static analysis of SPE threads

The listing below shows an spu-timing static timing analysis for the inner loop of the SPE code.

The SPE code is shown in Step 2: Port the PPE code for execution on the SPE, the Euler Particle-System Simulation example. This listing shows significant dependency stalls (indicated by the "-") and poor dual-issue rates. The inner loop has an instruction mix of eight even-pipeline (pipe 0) instructions and ten odd-pipeline (pipe 1) instructions. Therefore, any program changes that minimize data dependencies will improve dual-issue rates and lower the cycle per instruction (CPI).
							.L19:
0D                                                78       a       $49,$8,$10
1D 012                                            789      lqx     $51,$6,$9
0D                                                 89      ila     $47,66051
1D 0123                                            89      lqx     $52,$6,$11
0  0                                                9      ai      $7,$7,-1
0  ----456789                                              fma     $50,$51,$12,$52
1       -----012345                                        stqx    $50,$6,$11
1             123456                                       lqx     $48,$8,$10
0D             23                                          ai      $8,$8,4
1D             234567                                      lqa     $44,ctx+16
1               345678                                     lqx     $43,$6,$9
1                ---7890                                   rotqby  $46,$48,$49
1                    ---1234                               shufb   $45,$46,$46,$47
0                        ---567890                         fm      $42,$12,$45
0d                           -----123456                   fma     $41,$42,$44,$43
1d                                ------789012              stqx    $41,$6,$9
0D                                       89                 ai      $6,$6,16
                                                         .L39:
1D                                       8901                brnz    $7,.L19
The character columns in the above static-analysis listing have the following meanings:

Static-analysis timing files can be quickly interpreted by:

This information can be used to understand what areas of code are scheduled well and which are poorly scheduled.

About SPU_TIMING:

If you are using a Bash shell, you can set SPU_TIMING as a shell variable by using the command export SPU_TIMING=1. You can also set SPU_TIMING in the makefile and build the .s file by using the following statement:
 SPU_TIMING=1 make foo.s

This creates the timing file for file foo.c . It sets the SPU_TIMING variable only in the sub-shell of the makefile. It generates foo.s and then invokes spu-timing on foo.s to produce a foo.s.timing file.

Another way to invoke the performance tool is by entering one of the following statements in the command prompt:
 SPU_TIMING=1 make foo.s