Recognition of Reductions and Loop-Interchange using Polly

(was: LLVM Auto-vectorization using Polly)

(was: Binary Translation and Optimization of Alpha binaries using LLVM/QEMU)


Progress

After my latest report, I began working on implementing vectorization in Polly [1]. I sent an initial patch to the polly-dev mailing list on November 30 [2] and received good feedback from Tobias Grosser, the author of Polly, and some stylistic suggestions for the patch. After a few days lay off (for a take-home exam) I resumed work, only to find that Tobias had not only refactored and committed my patch but had gone ahead and implemented auto-vectorization support at the same time [3]. He did say "And sorry again for getting into your project."

He suggested a few topics that I could begin work on, including recognition of reductions which I chose.

I've submitted 11 patches to Polly, all of which have already been committed to the project's git repository [4].

The first five implement reduction recognition and add four test-cases to verify correctness. The last six implement a loop-interchange optimization pass which makes use of the ability to recognize reductions.

Loop-interchanging gives significant speed-ups in my attached tests case.

$ ./reduction_plain 
reduction: 5.427955 seconds 
manually_interchanged: 0.197127 seconds
$ ./reduction_interchanged 
reduction: 0.195193 seconds 
manually_interchanged: 0.194186 seconds

Future Work

Many yet-to-be-written optimization passes in Polly can make use of reduction recognition. Powerful optimizations like parallelization, vectorization, or combination of the two will be able to be applied to basic blocks recognized as reductions. Plenty of work still needs to be done on OpenMP support, for instance. Research into auto-parallelization of reductions would be interesting.

The loop-interchange optimization pass is relatively stupid, blindly interchanging loops if it detects that they are single-statement reductions. This pass needs to be made more general, with the ability to recognize situations when loop-interchange is beneficial and when it is now. With loop-interchange made less stupid, auto-vectorization will be made much simpler as well.

MathML XHTML CSS Tableless UTF-8 RSS