editing disabled
vivdesh
guest · Join · Help · Sign In · Wikispaces

Protected
Performance Analysis
We tested our auto vectorizing compiler on sparcv9(with VIS) and x86 (with SSE) platforms. According to theoretical analysis, for intel SSE, two instructions on double values execute in one clock cycle, which should ideally give double or 100% speed up. But the presence of serial components and also optimizations performed by llvm compiler (with and without vectorization) makes performance evaluation difficult. Still for our simple test case we noticed a performance gain.
In our simple benchmark code, there is a simple FOR loop with 100000 iterations containing single statement of addition which could be vectorized, and there are 2 FOR loops of 100 iterations to initialize and write respectively.
FOR i:=0 TO 999 DO
b[i]:=i;
c[i]:=i;
ENDFOR;
 
FOR i:=0 TO 99999 DO
a[i]:=b[i]+c[i];
ENDFOR;
 
FOR i:=0 TO 999 DO
WRITE(a[i]);
ENDFOR;


The time in seconds for execution of vectorized and non-vectorized benchmark program for the two SIMD architectures are as shown below,
perfanalysis.JPG
Thus considering the serial or non vectorizable component in the program and the also other optimizations applied by LLVM compiler, we found a speed up of 25.5%

Home
Loading...
Home Turn Off "Getting Started"
Loading...