The listing below shows a dynamic timing analysis on the same SPE inner loop using the IBM Full System Simulator for the Cell Broadband Engine.
SPU DD1.0 *** Total Cycle count 43120454 Total Instruction count 18068949 Total CPI 2.39 *** Performance Cycle count 43120454 Performance Instruction count 18068949 (18062968) Performance CPI 2.39 (2.39) Branch instructions 1001990 Branch taken 1000007 Branch not taken 1983 Hint instructions 1973 Hint hit 1000001 Contention at LS between Load/Store and Prefetch 2000986 Single cycle 12049144 ( 27.9%) Dual cycle 3006912 ( 7.0%) Nop cycle 4003 ( 0.0%) Stall due to branch miss 17977 ( 0.0%) Stall due to prefetch miss 0 ( 0.0%) Stall due to dependency 28042299 ( 65.0%) Stall due to fp resource conflict 0 ( 0.0%) Stall due to waiting for hint target 110 ( 0.0%) Stall due to dp pipeline 0 ( 0.0%) Channel stall cycle 0 ( 0.0%) SPU Initialization cycle 9 ( 0.0%) ----------------------------------------------------------------------- Total cycle 43120454 (100.0%) Stall cycles due to dependency on each pipelines FX2 5909 SHUF 6011772 FX3 1960 LS 7022608 BR 0 SPR 0 LNOP 0 NOP 0 FXB 0 FP6 15000050 FP7 0 FPD 0 The number of used registers are 73; the used ratio is 57.03