The listing below shows a dynamic timing analysis on the IBM Full System Simulator for the Cell Broadband Engine for the optimized SPE thread (process buffer only). It shows that 78 registers are used, so the used percentage is 60.94.
SPU DD1.0 *** Total Cycle count 7134843 Total Instruction count 10602009 Total CPI 0.67 *** Performance Cycle count 7134843 Performance Instruction count 10602009 (9839265) Performance CPI 0.67 (0.73) Branch instructions 253940 Branch taken 251967 Branch not taken 1973 Hint instructions 2952 Hint hit 250980 Contention at LS between Load/Store and Prefetch 6871 Single cycle 3815689 ( 53.5%) Dual cycle 3011788 ( 42.2%) Nop cycle 5898 ( 0.1%) Stall due to branch miss 34655 ( 0.5%) Stall due to prefetch miss 0 ( 0.0%) Stall due to dependency 266732 ( 3.7%) Stall due to fp resource conflict 0 ( 0.0%) Stall due to waiting for hint target 72 ( 0.0%) Stall due to dp pipeline 0 ( 0.0%) Channel stall cycle 0 ( 0.0%) SPU Initialization cycle 9 ( 0.0%) ----------------------------------------------------------------------- Total cycle 7134843 (100.0%) Stall cycles due to dependency on each pipelines FX2 8808 SHUF 1971 FX3 5870 LS 32 BR 0 SPR 1 LNOP 0 NOP 0 FXB 0 FP6 250050 FP7 0 FPD 0 The number of used registers are 78, the used ratio is 60.94
For details about performance simulation, including examples of coding for simulations, see The simulator. The IBM Full System Simulator for the Cell Broadband Engine described in that chapter supports performance simulation for a full system, including the MFCs, caches, bus, and memory controller. )