Dynamic analysis of SPE threads

The listing below shows a dynamic timing analysis on the same SPE inner loop using the IBM Full System Simulator for the Cell Broadband Engine.

The results confirm the view of program execution from the static timing analysis:
  SPU DD1.0
  Total Cycle count               43120454
  Total Instruction count         18068949
  Total CPI                       2.39
  Performance Cycle count         43120454
  Performance Instruction count   18068949 (18062968)
  Performance CPI                 2.39 (2.39)
  Branch instructions             1001990
  Branch taken                    1000007
  Branch not taken                1983
  Hint instructions               1973
  Hint hit                        1000001
  Contention at LS between Load/Store and Prefetch 2000986
  Single cycle                                          12049144 ( 27.9%)
  Dual cycle                                             3006912 (  7.0%)
  Nop cycle                                                 4003 (  0.0%)
  Stall due to branch miss                                 17977 (  0.0%)
  Stall due to prefetch miss                                   0 (  0.0%)
  Stall due to dependency                               28042299 ( 65.0%)
  Stall due to fp resource conflict                            0 (  0.0%)
  Stall due to waiting for hint target                       110 (  0.0%)
  Stall due to dp pipeline                                     0 (  0.0%)
  Channel stall cycle                                          0 (  0.0%)
  SPU Initialization cycle                                     9 (  0.0%)
  Total cycle                                           43120454 (100.0%)
  Stall cycles due to dependency on each pipelines
   FX2        5909
   SHUF       6011772
   FX3        1960
   LS         7022608
   BR         0
   SPR        0
   LNOP       0
   NOP        0
   FXB        0
   FP6        15000050
   FP7        0
   FPD        0
  The number of used registers are 73; the used ratio is 57.03