To appear at (HIPS'99), San Juan, Puerto Rico, USA, April 12, 1999
Abstract
Parallel programming languages like Fortran 90, Nesl, and approaches based
on BMF support operations that manipulate aggregate data structures
as a whole. These operations are usually implemented by using a library of
parallel operations based on an implementation-specific representation of
the aggregate structures. The library routines are often high-level, and
thus, simplify the compiler and encapsulate machine-dependencies. However,
such an approach complicates optimizations aiming at good use of processor
caches and at minimizing communication on distributed-memory machines. We
propose a different implementation technique for aggregate data structures
that allows compiler optimizations on all levels of the memory hierarchy.
By breaking the abstraction enforced by a library, optimizations across
multiple operations become possible and data distribution gets more
flexible. We outline an intermediate language that distinguishes local and
distributed data structures and separates local from global operations.
This language allows to implement the mentioned optimizations by program
transformation---as a result, the optimizations are easy to comprehend, the
compiler can be well structured, optimizations are easily re-ordered, and
the optimizations can be proved correct individually. Finally, we report on
first results obtained by applying our approach to the implementation of
nested data parallelism on distributed-memory machines.