| |
Methodology
Overview
Our instrumentation of arbitrary binaries took place over the course
of four stages. The first stage consisted of a familiarization with the tools we
would be using (Gprof, Gnu C Compiler, and Dyninst). The second stage consisted
of proving we could insert profiling code directly into the source of an
application and ensuring the profile was correct. The third stage extended the
second by ensuring that the inserted source code could be called from a dynamic
library. This additional stage proved necessary because Dyninst calls its
snippets from a dynamic library. The final stage consisted of dynamically
inserting profiling instrumentation directly into arbitrary binaries using
Dyninst.
Stage 1: Familiarization with tools
The first step of the project was becoming familiar with the tools we would be
using to develop a method of instrumenting binaries with Gprof profiling. The
tools we focused on were Gprof, the Gnu C Compiler (gcc), and Dyninst. Mike
Noeth and Jyothish Varma worked to become team experts on Gprof and gcc while
Tao Yang became the expert on Dyninst.
Through use of GDB and examination of the gcc source code, we discovered that a
function called mcount was called at the beginning of every function to be
profiled by Gprof. On the first call to mcount, all data structures for
profiling were setup, and an initial call to profil (a system call to perform a
stack walk and collect information every 10 milliseconds – see man profil for
more information) was made. Subsequent calls to
mcount resulted in tallying calls to each individual function while the profil
continued to perform its stack walk every 10 milliseconds. The final call to
mcount resulted in all the data collected being written to file (gmon.out) and
stopping the profil stack walk. The mcount function was extracted from the gcc
source code and compiled into an object file from which we could link the mcount
call into our programs.
Familiarization with Dyninst was broken down into three major tasks to be
prototyped. The first task was to specify a binary for Dyninst to instrument
with code. This would later allow our tool to instrument a target binary with
Gprof profiling code. The second task required that we be able to identify all
the functions used in a program. This would later allow our tool to identify the
functions to be prototyped. The third task required that we be able to insert
code at the beginning of a function. This would later allow our tool to
instrument the necessary call to mcount at the beginning of each function.
Stage 2: Static compilation of mcount
The second stage consisted of creating a simple application and inserting
profiling code into the existing source code by hand. This would ensure that we
understood how mcount was working, and that we knew when and where to call it.
Two binaries were compiled: one with the -pg option specified and the other with
the profiling code inserted by hand. Both binaries generated gmon.out profiles
that could be examined using Gprof.
To complete this stage, we needed to ensure the resultant profiles of both
binaries matched. We began this stage by creating a micro benchmark that would
serve as our application (Micro benchmark source code). The benchmark made
multiple function calls to create a somewhat complex call graph. We compiled the
micro benchmark and ensured it executed as expected. Next, we modified the micro
benchmark with calls to mcount at the beginning of each function. The modified
micro benchmark was linked to the object file containing mcount obtained in
stage 1.
We compiled two binaries: the first was the unmodified micro benchmark compiled
with the -pg option specified; the second was the modified micro benchmark
linked with the mcount code obtained in stage 1. Both binaries generated
gmon.out files which were examined using Gprof. The resultant profiles matched
exactly in the number of calls made to each function as well as the time spent
in each function. From here we decided
we were ready to move onto the next stage.
Stage 3: Dynamic compilation of mcount
Originally, we did not plan on a third stage. We thought that after proving our
design via stage 2, we would be able to move on to stage 4 (implementing a tool
to instrument binaries with Gprof profiling). After attempting stage 4 and
seeing no success, we had to back track. Reviewing Dyninst documentation
revealed that the code we were inserting was called from a dynamic library. Thus
we decided that we had to call the mcount function from a dynamic library (by
hand) to ensure this was not the cause of failure in stage 4. The mcount
function discovered in stage 1 was compiled into a dynamic library. We loaded
our dynamic library (appendix A – Wrapper source code) and rather than making
static calls to mcount in the modified micro benchmark, we called mcount from
the dynamic library. This additional stage revealed that part of our problem was
due to name collisions. When gcc is compiled and installed, it adds an
additional library with mcount and its supporting
functions. Rather than calling our profiling code from the dynamic library we
created, the functions from the library installed with gcc were being invoked.
By renaming mcount and all the supporting functions, we were able to avoid the
name collisions.
Stage 4: Dynamic instrumentation of mcount
With a new dynamic library created from stage 3 (with new names for mcount
and its supporting functions) we attempted to create our tool again. We simply
attempted to call the renamed mcount at the beginning of each function for an
arbitrary binary. There were two complications at this phase. The first
complication was that mcount requires knowing the address range of the code it
is profiling, but this information was being
corrupted due to the use of dynamic libraries. The second complication was that
Dyninst was messing up the stack. Thus, the stack walk code was unable to
perform correctly. The first call to mcount setup data structures for tracking
the profiling data. To allocate space for this data to be collected, mcount
requires knowing the address range of the program being profiled. In the gcc
version of mcount, a hard coded value is used to
specify the beginning address of the code segment, and a function etext (we
believe this stands for end of text segment) returns the address of the last
function in the text segment of a program to be used as the ending address of
the code segment. The ranges were coming out incorrect due to another naming
collision with etext. The etext function was not returning the correct address
anymore. To resolve this issue, we used Dyninst to scan all the functions be
profiled, collect their addresses, and pick the largest and smallest address
found. The smallest address specified the beginning of the range while the large
address specified the end of the range. With this modification, we were able to
work around the name collision of etext. In order to create a call graph, Gprof
requires knowledge of which function called mcount (we will refer to this
function as “Caller #1”), and knowledge of which function called Caller #1 (we
will refer to this function as “Caller #2”). To obtain these values, the mcount
code uses a system call __builtin_addr(int level). To get Caller #1, mcount uses
__builtin_addr(0), and to get Caller #2, mcount uses __builtin_addr(1). When
Dyninst is used to insert calls to mcount, the stack is corrupted (see figure
1). Due to Dyninst’s method of inserting code, mcount could no longer determine
Caller #2. We tried to use increasing levels of __builtin_addr, but these
resulted in memory address
from the heap (where Dyninst created its own fake stack). When the top of the
stack is reached, any higher levels cause the application to segfault.
To work around the Dyninst stack, we implemented
our own simple stack (Fake stack source code). Upon entering a new function, we
pushed that functions address onto the stack. Upon leaving a function, we popped
the address off of the stack. We modified mcount so that it would use the
address of the item on the top of the stack for Caller #1 (rather than
__builtin_addr(0)) and the item one away from the top of the stack
for Caller #2 (rather than __builtin_addr(1)). With this modification, we were
able to work around the stack corruption that Dyninst introduced.
After implementing our own range finder and stack, we were able to generate the
correct profiles for arbitrary binaries. The proof of our design is shown in the
next section.
|