Abstract
Introduction
Methodology
Progress
Results
Conclusion
Reports
References

Abstract

One of best methods to study the program behavior and performance is profiling. Nevertheless, the traditional profiling tools required to recompile the programs from the original source code which is not available most time. We have developed a profiling program with a binary rewriting tool, Dyninst , and an Unix profiling tool, gprof , that can be used with post-compiled programs without recompiling. Experiments showed that our program could work very well for most simple applications. However, further work needs to been done in supporting large scaled and parallel applications.

 

Introduction

As both computer software and hardware have been evolved for several decades, the ability of what computer systems can do also has been greatly advanced. In speaking of which, the most significant area of advancement of computer system is in processing power. This achievement is not only from innovative hardware such as CPU, memory, and network but also from software which manages those devices to operate. It has been known that the performance of computer system is heavy depends on how software takes advantage of the hardware. However, due to the complexity of hardware and software, it is difficult to understand how they would operate before we actually run it. Even if we actually run a program, we might be able to tell whether program is fast or slow but we are still not able to tell exactly why it is so slow or so fast. Therefore, people needed a way to see what is happening inside of the program at run-time. Perhaps the best known method for studying the performance of software and system is profiling. Profiling techniques have been developed for more than thirty years, such as prof [7] and gprof [9] for Unix systems. Using these profiling tools, programmers now can easily observe actual program behavior during its execution and adjust software to bring a better performance by eliminating or reducing speed bottlenecks. However, even though there are great tools like gprof , as an end user who does not have an access to the original program source code can not take advantage of profiling because most of these tools require recompiling the source code with their profiling library. Our goal for this project is to develop a tool which implements a profiling layer onto a post-compiler binary so that even an end user can measure the program performance even if the original source code is not available. There are benefits of this approach compared to the traditional profiling tools. First, binary instrumentation is done after compilation which can save the nontrivial compilation time of large programs. Second, profiling can be done without accesses to the program source files. This add more flexibilities for computer vendors to provide third party performance analysis. Third, it is possible to enable partial program profiling at runtime, which could be useful in the dynamic code optimization.

 

Methodology

For our project we have used two programs, gprof and Dyninst [2], for inserting profiling routines into a post-compiled program as we described in Figure 1.

gprof is a profiling tool which allows a user to learn where a program spends its time and which functions called which other functions while it is executing. This information helps a user to understand what pieces of the program are slower than expected and might be candidates for rewriting to make the program execute faster. To use gprof , the program needs to be first compiled and linked with profiling enabled (with -pg option in GCC [6]). During the compilation special system calls and instructions will be inserted into the original program to help collect runtime information. Then instrumented program is executed and a profile data file will be generated. Finally the profile data file is analyzed by calling gprof with different options.

DynInst is a post-compiler program manipulation tool which provides a C++ class library for program instrumentation. Using this library, it is possible to instrument and modify application programs during execution. A unique feature of this library is that it permits machine-independent binary instrumentation programs to be written. With APIs provided by DynInst it is much less sophisticated to build tools such as a utility to count the number of times a function is called, a program to capture the output of an already running program to a file, and an implementation of conditional break points.


The project was conducted in several sequences of tasks as described in Figure 2.

The first task was understand a profiling data file, gmon.out , which was generated from gprof and create the exact same profiling data file with own implementation. We tried two directions to study on gmon.out and this task was done in parallel to reduce the time to seek the optimal approach. One person started examining the GNU compiler, gcc , and tried to see how it manipulated target program source codes to generate gmon.out , and another person tried to examine the internal implementation of gprof to understand the structure of a profiling data file, gmon.out, by reverse engineering. The second task was study the detail usage of Dyninst . In this step we studied how Dyninst could be used to modify a compiled program and created a small program which inserted a user specified function calls to existing program and link shared libraries by using Dyninst instrumentation. The third task was to implement profiling via binary instrumentation with DynInst . Since the first task and the second task were successfully completed, we were ready to develop a program which transformed a post-compiled program to produce the profiling data, gmon.out , without recompiling the original source code. With Dyninst , we were able to insert a profiling trigger, mcount function, into every function and appended our own implementation to generate gmon.out .




Task 1 – Building a profiling data generator

First, we wrote a very simple program “Hello World” and compiled with –pg flag to see how a compiled outcome with –pg flag differed from one compiled without –pg flag. Since both created files were binary, we could not easily compare the difference. However, we knew for sure additional code had been inserted when the flag was used in view of the fact that the size of the compiled program was about twice as big as other one. We used objdump [5] program to disassemble the program to see what routines are included. We found that the program compiled the flag had additional function such as monstartup() , _mcleanup() , mcount() , and internal_mcount() , and there where mcount() function calls at the beginning of every existing function as shown in Figure 3. We further had found gmon.c of GCC had all those functions listed for collecting the profiling information and creating gmon.out file.

We first tried to compile it under Linux but had much trouble. However, we were able to compile gmon.c in Solaris Gearbox successfully. Our hypothesis was that if we compile a program together with gmon.c and added the profiling initiating function mcount() calls on every function of the program, we should be able to generate the profiling data. As we expected we were able to generate gmon.out and verified that it was valid.

0000000000011510 <print_me>:

save %sp, -112, %sp

sethi %hi(0x21800), %g1

or %g1, 0xc0, %o0

nop

sethi %hi(0x11400), %g1

or %g1, 0x2a8, %o0

sethi %hi(0x11400), %g1

or %g1, 0x150, %o1

call 217e0 <_PROCEDURE_LINKAGE_TABLE_+0xfc>

nop

call 114d8 <print_me_twice>

nop

nop

ret

restore

0000000000011550 <main>:

save %sp, -120, %sp

st %i0, [ %fp + 0x44 ]

st %i1, [ %fp + 0x48 ]

sethi %hi(0x21800), %g1

or %g1, 0xc4, %

nop

clr [ %fp + -20 ]

ld [ %fp + -20 ], %g1

cmp %g1, 9

bg 1159c <main+0x4c>

nop

call 11510 <print_me>

...

0000000000011510 <print_me>:

save %sp, -112, %sp

sethi %hi(0x21800), %g1

or %g1, 0xc0, %o0

call 10e8c <_mcount>

nop

sethi %hi(0x11400), %g1

or %g1, 0x2a8, %o0

sethi %hi(0x11400), %g1

or %g1, 0x150, %o1

call 217e0 <_PROCEDURE_LINKAGE_TABLE_+0xfc>

nop

call 114d8 <print_me_twice>

nop

nop

ret

restore

0000000000011550 <main>:

save %sp, -120, %sp

st %i0, [ %fp + 0x44 ]

st %i1, [ %fp + 0x48 ]

sethi %hi(0x21800), %g1

or %g1, 0xc4, %o0

call 10e8c <_mcount>

nop

clr [ %fp + -20 ]

ld [ %fp + -20 ], %g1

cmp %g1, 9

bg 1159c <main+0x4c>

nop

call 11510 <print_me>

...

Figure 3 Difference in disassembled code from objdump between Hello World program without
–pg (left) and with –pg (right) flag at compilation.

Task 2 - Dyninst

Since we were only able to compile gmon.c under Solaris, we set Solaris as our target platform and Dyninst was most stable on this platform as well. Compiling Dyninst was not very simple and did not go successful. Instead spending too much of our time, we downloaded a binary version of Dyninst of Solaris 2.8. There was another problem in working with Dyninst. Because the Dyninst was so huge, about 150MB, we could not have it under our unity space. Hence, we had to put Dyninst in a temporary directory (/var/tmp) with risk of being deleted. Again we used a simple “Hello World” program this time for start. Even though we had some trouble of setting up the required libraries for Dyninst, we were able to modify “Hello Word” program to call functions we specified.

Task 3 – Profiling with Dyninst via binary rewriting

Even though Dyninst allows us to modify and add new code into an existing program, it was very limited in terms of what we could do. It allowed us to do insert basic operations such as addition, subtraction, multiplication, and division with variables, or calling existing functions. Performing a sophisticated operation was very difficult and it was suggested to create a shared library and call it from Dyninst. From gmon.c we created a shared library, libgmon.so , and loaded it from Dyninst. The loading the shared library part was successful, however, we had a difficulty in calling mcount() function for profiling. Because the original mcount() function was written in assembly in order to reduce a overhead by calling the function many times , Dyninst could not recognize the function. Therefore, we created a new function called my_mcount() which was C version of mcount() function.

In gmon.out generator, it required the total size of code segment size to define a profiling region and to formulate its data structure. The original implementation obtain this code size information from a predefined variable, etext .[5] When GCC compiles a program, it automatically defines three variables, etext , edata , and end . As shown in Figure 4, etext indicates the first address after the end of the code segment, edata indicates the first address after the end of data segment, and end indicates the first address after the end of the data segment when program is loaded. When we compiled a program together with gmon.c statically, we were able to get the correct etext value. However, when we tried to use this value, it always returned an incorrect address for the code segment. We even tried to calculate the code segment size with other sources such as sbrk , end , and edata but had no success. Thus we needed another approach. Instead finding the code segment size information inside of the program, we decide to calculate from Dyninst. In Dyninst we were able to collect all the functions with location and size in a program. With this information, we calculated the code segment size by finding the maximum value of the address plus the size of functions. Then, we made function call, set_code_segment_size() , to set the etext before calling mcount() for the first time.

Another major problem we had was getting an incorrect correct address for the most recent function call. The profiling data generator needed two addresses - the most recent function call and its return address - in order to record which function called what function. Ordinary, these addresses were pushed onto stack whenever there was a function call, and by looking up the stack we should had been able to obtain the most recent call function address and its return address. Whenever Dyninst needed to perform a operation, it made a jump to its code appended at end of program [3]. Therefore when we tried to call my_mcount() function, it was not called from the location where it should had been called at the beginning of a function, instead, it was called from the Dyninst appended code region as shown in Figure 5.

To resolve this problem, we modified my_mcount() function to take the address of the recent function call as parameter and use it instead of the value from the stack.

Figure 5 How Dyninst modifies the program to perforam user specified operations (Right) from the orignal (Left). This way of Dyninst operation causes to obtain an incorrect function call address.

 

Progress

Task 1 - Understand a profiling data file and generate gmon.out
Task 2 - Study the detail usage of DynInst and append user specified libraries
Task 3 - Implement profiling via binary instrumentation with DynInst

 

Results

We have tested our program on a number of programs to confirm its functions and correctness. However, we still have some issues to overcome.

First as a basic, we started with “Hello World” program. We compiled hello.c with –pg flag. As we executed the program it generated a profiling data file, gmon.out . With gprof it generated profiling analysis information. This time we executed the original “Hello World” program compiled without –pg flag from our profiling tool and compared the results.

The outcomes from gprof of both execution showed the exactly the same thing as shown in Figure 6. This data indicates that our profiling tool was working properly for a simple application. We picked another program bzip2, a data compress tool to test. The gprof outputs from program executions were different. They are almost alike except that the one compiled with the flag had one extra function information compare to ours. We did not profile any of our code such as my_mcount() , internal_mcount() , monstart() , etc, but the one compiled with the flag profiled internal_mcount() as well. We are not so sure why simple application like “Hello World” did not have this problem but large program such as bzip2 had. We also tested gzip, another a compress tool, and it also had the same issue. However, this is probably not so significant matter since not profiling our own code is almost certainly preferable for users.

granularity: each sample hit covers 4 byte(s) no time propagated

called/total parents

index %time self descendents called+self name index

called/total children

0.00 0.00 10/10 main [3]

[1] 0.0 0.00 0.00 10 print_me [1]

0.00 0.00 10/10 print_me_twice [2]

-----------------------------------------------

0.00 0.00 10/10 print_me [1]

[2] 0.0 0.00 0.00 10 print_me_twice [2]

-----------------------------------------------

0.00 0.00 1/1 _start [15]

[3] 0.0 0.00 0.00 1 main [3]

0.00 0.00 10/10 print_me [1]

-----------------------------------------------

granularity: each sample hit covers 4 byte(s) no time propagated

called/total parents

index %time self descendents called+self name index

called/total children

0.00 0.00 10/10 main [3]

[1] 0.0 0.00 0.00 10 print_me [1]

0.00 0.00 10/10 print_me_twice [2]

-----------------------------------------------

0.00 0.00 10/10 print_me [1]

[2] 0.0 0.00 0.00 10 print_me_twice [2]

-----------------------------------------------

0.00 0.00 1/1 _start [15]

[3] 0.0 0.00 0.00 1 main [3]

0.00 0.00 10/10 print_me [1]

-----------------------------------------------



Figure 6 Comparison of profiling information between Hello World program without
–pg (left) and with –pg (right) flag at compilation. They are the exactly the same.

Beside not profiling internal_mcount() , we had another two issues.

First, some programs can not be profiled. We tested basic Unix programs such as ls – listing files, cat – displaying file context, and cal – calendar, and all of these programs could not used with gprof . Our program was able to execute such programs and generate profile data files, however, when we tried to view profiling information with gprof , it did not display the information because it said it could not read symbol data from the programs as shown Figure 7. It was because the symbol data of these programs had been stripped off in order to reduce the size [8].

Another issue of our program was that when we tried to profile a large application like X-window programs, it either crashed or did not function properly. When we executed gnom-termial and visual slick editor , both X-window applications, with our program, it crashed with an error message reported from Dyninst . When we tried gzip , this time it run without an error, however, it created an invalid output. We assume these issue are maybe another problem of Dyninst but still not very certain.

 

Conclusion

Our profiling tool incorporates great features from a widely used traditional Unix profiling tool, gprof , and an innovate dynamic binary rewriting tool, Dyninst , to provide an end user a way to profile the performance of post-compile programs without having the original source code. In addition, because both gprof and Dyninst support variety platforms, our tool also can be used on many different types of computer systems. As we already showed in the result section, it works for majority of Unix programs. A few programs which do not contain proper information such as function names, symbol tables, etc, can not be profiled because gprof is not able to read the information from the programs even though our tool instruments and generates profiling data. We see this limitation, however, there is no good solutions at this moment.

There is one area we would like to extend our tool. Although our tool was developed to work with serial programs on Solaris, our primary goals of our design were flexibility and extensibility. As such, we are positioning our tool to operate with parallel programs as well. In particular, there are opportunities to apply mpiP (MPI Profiling) [1] along with gprof and Dyninst for support parallel programs.

 

Reports

Part 1 - Project Proposal [PDF]

Part 2 - Project Progress Report [PDF]

Final - Project Final Report [PDF]

 

References

[1] Jeffrey Vetter, Chris Chambreau. mpiP: Lightweight, Scalable MPI Profiling . Version 2.8.2. April 2005

[2] Computer Science Department of University of Maryland. DyninstAPI Programmer's Guide . Release 4.2. March 2005.

[3] Bryan R. Buck, Jeffrey K. Hollingsworth. An API for Runtime Code Patching. Journal of High Performance Computing Applications 14. Winter 2004.

[4] Free Software Foundation, Inc. GCC 3.4.3 Manual. November 2004.

[5] Tool Interface Standards. Executable and Linkable Format (ELF)

Portable Formats Specification, Version 1.1. September 2004.

[6] Free Software Foundation, Inc. Objdump Manual. May 2004.

[7] Sun Microsystems. Unix manual for prof . SunOS 5.8 November 1999.

[8] Sun Microsystems. Unix manual for file . SunOS 5.8 April 1996.

[9] Sun Microsystems. Unix manual for gprof . SunOS 5.8 Jul 1998.


Dr. Frank Mueller
Kyung Chul Lee
Heshan Lin