Performance Analysis and Optimization for Scientific Applications

This proposal addresses the increasing challenge of computers to exploit the available memory bandwidth for scientific applications. The speed of computers (processors) increases by about 60% annually while the speed of memory accesses only increases at a rate of 7% per year. Hierarchical memory organizations with caches at multiple levels above main memory have the potential to hide slow memory accesses. However, most computer programs were not designed to fully exploit caches since the cache configuration differs from computer to computer. As a result, the pace of computations is often bound by the amount of memory references and the level of the memory hierarchy where these references are resolved. We address this problem by employing techniques of binary rewriting to analyze and optimize existing computer programs with regard to their memory performance while they run. This allows us to customize a program for a specific computer, i.e., your program becomes faster while it runs. While our methods are widely applicable to any computer program, we focus our efforts specifically on scientific computing, i.e., simulation programs of problems in chemistry, physics and other natural sciences that, due to their complexity and size in data, require the use of massively parallel computers dedicated to solve this task. We are developing tools in support of memory performance analysis for this task.