You can collect and display simple performance statistics on a program without performing any instrumentation of the program code. Collection of more complex statistics requires program instrumentation.
The names of the PPE and SPE programs are tpa1 and tpa1_spu, respectively. Part of the most important sections of the programs are shown in Example program: tpa1.
PATH=/opt/ibm/systemsin-cell/bin:$PATH; systemsim
systemsim %
mysim spu 0 set model pipeline mysim spu 1 set model pipeline mysim spu 2 set model pipeline
mysim go
callthru source tpa1 > tpa1 callthru source tpa1_spu > tpa1_spu chmod +x tpa1 chmod +x tpa1_spu
tpa1
mysim spu 0 display statistics mysim spu 1 display statistics mysim spu 2 display statistics
SPU DD3.0 *** Total Cycle count 35185 Total Instruction count 643 Total CPI 54.72 *** Performance Cycle count 35185 Performance Instruction count 1701 (1502) Performance CPI 20.68 (23.43) Branch instructions 135 Branch taken 120 Branch not taken 15 Hint instructions 9 Hint hit 31 Contention at LS between Load/Store and Prefetch 49 Single cycle 1108 ( 3.1%) Dual cycle 197 ( 0.6%) Nop cycle 137 ( 0.4%) Stall due to branch miss 1655 ( 4.7%) Stall due to prefetch miss 0 ( 0.0%) Stall due to dependency 826 ( 2.3%) Stall due to fp resource conflict 0 ( 0.0%) Stall due to waiting for hint target 11 ( 0.0%) Issue stalls due to pipe hazards 6 ( 0.0%) Channel stall cycle 31236 ( 88.8%) SPU Initialization cycle 9 ( 0.0%) ----------------------------------------------------------------------- Total cycle 35185 (100.0%) Stall cycles due to dependency on each pipelines FX2 62 ( 7.5% of all dependency stalls) SHUF 322 ( 39.0% of all dependency stalls) FX3 2 ( 0.2% of all dependency stalls) LS 413 ( 50.0% of all dependency stalls) BR 0 ( 0.0% of all dependency stalls) SPR 21 ( 2.5% of all dependency stalls) LNOP 0 ( 0.0% of all dependency stalls) NOP 0 ( 0.0% of all dependency stalls) FXB 0 ( 0.0% of all dependency stalls) FP6 0 ( 0.0% of all dependency stalls) FP7 0 ( 0.0% of all dependency stalls) FPD 6 ( 0.7% of all dependency stalls) The number of used registers are 128, the used ratio is 100.00 dumped pipeline stats
Although the programs on SPE 0 and SPE 2 are the same, the program on SPE 0 executed the loop zero times, but the program on SPE 2 executed the loop six times.
SPU DD3.0 *** Total Cycle count 35537 Total Instruction count 643 Total CPI 55.27 *** Performance Cycle count 35537 Performance Instruction count 1802 (1590) Performance CPI 19.72 (22.35) Branch instructions 153 Branch taken 136 Branch not taken 17 Hint instructions 15 Hint hit 37 Contention at LS between Load/Store and Prefetch 49 Single cycle 1170 ( 3.3%) Dual cycle 210 ( 0.6%) Nop cycle 150 ( 0.4%) Stall due to branch miss 1854 ( 5.2%) Stall due to prefetch miss 0 ( 0.0%) Stall due to dependency 879 ( 2.5%) Stall due to fp resource conflict 0 ( 0.0%) Stall due to waiting for hint target 23 ( 0.1%) Issue stalls due to pipe hazards 6 ( 0.0%) Channel stall cycle 31236 ( 87.9%) SPU Initialization cycle 9 ( 0.0%) ----------------------------------------------------------------------- Total cycle 35537 (100.0%) Stall cycles due to dependency on each pipelines FX2 86 ( 9.8% of all dependency stalls) SHUF 348 ( 39.6% of all dependency stalls) FX3 2 ( 0.2% of all dependency stalls) LS 413 ( 47.0% of all dependency stalls) BR 3 ( 0.3% of all dependency stalls) SPR 21 ( 2.4% of all dependency stalls) LNOP 0 ( 0.0% of all dependency stalls) NOP 0 ( 0.0% of all dependency stalls) FXB 0 ( 0.0% of all dependency stalls) FP6 0 ( 0.0% of all dependency stalls) FP7 0 ( 0.0% of all dependency stalls) FPD 6 ( 0.7% of all dependency stalls) The number of used registers are 128, the used ratio is 100.00 dumped pipeline stats