Large-scale parallel and distributed systems will be used to meet the computational demands of distributed simulations. These computer models enable scientific experiments prohibited by cost, capability, treaty, etc. (e.g. nuclear weapons). Application performance efficiencies (often 5-10% of peak) must improve to realize the computational power of these systems and decrease simulation time-to-solution significantly. As system size and complexity increase maintaining such efficiencies will prove challenging. Additionally, the fundamental drive to increase peak performance using thousands of power hungry components will lead to intolerable operating costs and failure rates. Without innovation, emergent petaflop systems will potentially require 100 Megawatts of power, the lighting requirements of a small city, and use a shrinking percentage of peak performance. We present our ongoing efforts to reduce the power-performance efficiency gap of distributed applications in large-scale systems. To improve performance efficiency, we propose an analytical model for analysis and prediction. Our model is fast, accurate and capable of quantifying the performance impact of memory and middleware on distributed communication. We discuss the collaborative use of our model by Argonne National Laboratory researchers to improve MPI performance. To improve power efficiency, we use emergent power-aware technologies (e.g. DVS) to conserve energy up to 25% for scientific codes on our 16-node Centrino-based Beowulf without impacting performance significantly (<5%). Our innovative approaches compress the power-performance gap from below by increasing achieved middleware performance and from above by decreasing theoretical peak system speed during application inefficiencies to conserve energy.