To appear at (HIPS'99), San Juan, Puerto Rico, USA, April 12, 1999
Abstract
In this paper, we present an efficient framework for intra-procedural
performance based program partitioning for sequential loop nests.
Due to the limitations of static dependence
analysis especially in the inter-procedural
sense, many loop nests are identified as sequential
but available task parallelism amongst them
could be potentially exploited. Since this available parallelism
is quite limited, performance based program analysis
and partitioning which carefully analyzes the interaction
between the loop nests and the underlying architectural
characteristics must be undertaken to effectively use this parallelism.
We propose a compiler driven approach that configures
underlying architecture to support a given communication mechanism. We
then devise an iterative program partitioning
algorithm that generates efficient program partitioning by analyzing
interaction between effective cost of communication and the corresponding
partitions. We model this problem as one of partitioning a directed acyclic
task graph (DAG) in which each node is identified with a sequential
loop nest and the edges denote the precedences and communication
between the nodes corresponding to data transfer between loop nests.
We introduce the concept of behavioral edges between edges and nodes
in the task graph for capturing the interactions between computation and
communication through parametric functions. We present an efficient
iterative partitioning algorithm using the behavioral edge augmented PDG
to incrementally compute and improve the schedule. A significant
performance improvement (factor of 10 in many cases) is demonstrated
by using our framework on some applications which exhibit this type of
parallelism.