To appear at (HIPS'01), San Francisco, California, USA, April 23, 2001
Abstract
Explicit-multithreading (XMT) is a parallel programming model designed for
exploiting on-chip parallelism. Its features include a simple thread
execution model and an efficient prefix-sum instruction for synchronizing
shared data accesses. By taking advantage of low-overhead parallel threads
and high on-chip memory bandwidth, the XMT model tries to reduce the
burden on programmers by obviating the need for explicit task assignment
and thread coarsening. This paper presents features of the XMT
programming model, and evaluates their utility through experiments on a
prototype XMT compiler and architecture simulator. We find the lack of
abstract explicit task assignment has slight effects on performance for the XMT
architecture. Despite low thread overhead, thread coarsening is still
necessary to some extent, but can usually be automatically applied by the
XMT compiler. The prefix-sum instruction provides more scalable
synchronization than traditional locks, and the simple
run-until-completion thread execution model (no busy- waits) does not
impair performance. Finally, the combination of features in XMT can
encourage simpler parallel algorithms that may be more efficient than more
traditional complex approaches.