Etch is an application program performance evaluation and optimization system, developed for Intel x86 platforms running the Windows/NT operating system. The system allows you to annotate existing binaries with arbitrary instructions (for example, to trace, or perform coverage analysis), or to rewrite an existing binary so that it executes more efficiently.
Etch works directly on x86 executables. It does not require program source code for either measurement or optimization.
Etch is targeted at two different user groups: developers, who wish to understand the performance of their programs during the development cycle, and users, who wish to understand and improve the performance of common applications executing in their environment.
Etch provides both groups with measurement tools to evaluate performance at several levels of detail, and optimization tools to automatically restructure programs to improve performance, where possible.
Etch reads executable binaries (and, under Win32, DLLs) for an application, modifies the image, and writes a new one that has been enhanced for measurement or optimization. The transformations performed on the binary by Etch do not change program correctness, although a program transformed for performance measurement collection will run more slowly. Etch does not require changes to the operating system, but a modified Etch binary may utilize OS facilities, such as software timers, or even implementation-specific facilities, such as Intel Pentium performance counters.
There are three key concepts in using Etch:
To instrument a program, Etch is invoked with the name of an executable and a DLL. The DLL provides a set of routines which are invoked for each instruction in the executable. Roughly, Etch operates as:
for each instruction in executable InstrumentBefore(instruction); InstrumentAfter(instruction); end; InstrumentBeforeMainProgram(); InstrumentAfterMainProgram();
The instrumentation tool provides implementations of these "Before" and "After" functions. The call back functions can in turn direct Etch to modify the executable with respect to the specific instruction. The directions in effect say "before (or after) this instruction runs, please call some specific function with some specific set of arguments." For example, to count instructions, the InstrumentBefore procedure would direct Etch to insert code that incremented a counter at runtime. These inserted instructions do not change the correctness of the program.
Once the entire executable has been scanned and instrumented,
Etch writes a new version of the executable that can be run. Any
functions referenced in the callback routines, as well as the
Etch runtime library are included in the new executable.
The executable written by Etch can be run, and any instrumentation
routines will run as a side effect of running the program. Instrumentation
routines, as the program is running, can inspect the state of
the program, for example, the contents of registers, or effective
addresses. All addresses, whether text or data, are relative to
the original binary, so the collection routines do not have to
compensate for the fact that they are part of a modified executable.
When an Etched program terminates, its data collection routines
can save information about the executable to disk. Later, post-processing
utilities can examine the data. For example, a predicted execution
time can be determined after the fact based on hypothetical processor,
cache, and memory speeds. At a lower level, detailed information
about a program's performance can be obtained such as is shown
below in the graph of instruction cache performance for a collection
of popular Win32 programs. The graph shows the miss penalty
of the first level instruction cache and a second level unified
cache for the Perl interpreter, three commercial C++ compilers,
and MS-WORD.
Etch also provides facilities for rewriting an executable in order to improve its performance. For example, the instrumentation phase, rather than adding new instructions, can direct Etch to write the executable out according to a different code layout optimized for cache and VM behavior.
The graph below shows the reduction in instruction cache misses
and execution time (in cycles) for a collection of popular Win32
programs that have been optimized for code layout using Etch on
a 90Mhz Pentium. Etch was first used to discover the programs'
locality while executing against a training set, and then rewritten
in order to achieve a tighter cache and VM packing. Infrequently
executed basic blocks were moved out of line, and frequently interacting
basic blocks were laid out contiguously in the executable. The
results were measured using inputs different than those used during
training.
In addition to a programming interface, Etch also offers a graphical
user interface for performing common instrumentation and optimization
operations. The user interface can drive the measurement process:
it runs Etch on the original binary to produce a new binary, modified
to collect the necessary behavioral data; it executes the modified
binary to produce the data; and it feeds the data to analysis
tools that produce graphs or charts that help to pinpoint problems.
Once a problem has been identified, the user may instruct Etch
to perform a performance-optimization transformation. For example,
Etch may rewrite the original binary to change the layout of data
or code in order to improve cache or virtual memory performance.
Etch runs on Intel 486, Pentium and P6 processors with at least 24 MB of memory. Etch works on 32-bit (Win32) binaries. It has been used for programs built by MSVC, Borland, and Intel compilers.
If you are interested in obtaining more information about Etch, please contact etch-info@cs.washington.edu
Etch is due to the efforts of people at Harvard University and the University of Washington. These include: Dennis Lee, Ted Romer, Geoff Voelker, Alec Wolman, Wayne Wong, Brad Chen, Brian Bershad, and Hank Levy.
Copyright (c) 1997 The University of Washington. All rights reserved.