Large-scale parallel and distributed systems will be used to meet the 
computational demands of distributed simulations. These computer models 
enable scientific experiments prohibited by cost, capability, treaty, etc. 
(e.g. nuclear weapons). Application performance efficiencies (often 5-10% 
of peak) must improve to realize the computational power of these systems 
and decrease simulation time-to-solution significantly. As system size and 
complexity increase maintaining such efficiencies will prove challenging. 
Additionally, the fundamental drive to increase peak performance using 
thousands of power hungry components will lead to intolerable operating costs 
and failure rates. Without innovation, emergent petaflop systems will 
potentially require 100 Megawatts of power, the lighting requirements of a 
small city, and use a shrinking percentage of peak performance.

We present our ongoing efforts to reduce the power-performance efficiency gap 
of distributed applications in large-scale systems. To improve performance 
efficiency, we propose an analytical model for analysis and prediction. Our 
model is fast, accurate and capable of quantifying the performance impact of 
memory and middleware on distributed communication. We discuss the 
collaborative use of our model by Argonne National Laboratory researchers to 
improve MPI performance. To improve power efficiency, we use emergent 
power-aware technologies (e.g. DVS) to conserve energy up to 25% for 
scientific codes on our 16-node Centrino-based Beowulf without impacting 
performance significantly (<5%). Our innovative approaches compress the 
power-performance gap from below by increasing achieved middleware performance
 and from above by decreasing theoretical peak system speed during application
 inefficiencies to conserve energy.