A java agent to generate /tmp/perf-<pid>.map
files for just-in-time(JIT)-compiled methods for use with the Linux perf
tools.
Make sure JAVA_HOME
is configured to point to a JDK. You need cmake >= 2.8.6 (see #30). Then run the following on the command line:
cmake .
make
# will create links to run scripts in <somedir>
bin/create-links-in <somedir>
Linux perf
tools will expect symbols for code executed from unknown memory regions at /tmp/perf-<pid>.map
. This allows runtimes that
generate code on the fly to supply dynamic symbol mappings to be used with the perf
suite of tools.
perf-map-agent is an agent that will generate such a mapping file for Java applications. It consists of a Java agent written C and a small Java bootstrap application which attaches the agent to a running Java process.
When the agent is attached it instructs the JVM to report code blobs generated by the JVM at runtime for various purposes. Most importantly,
this includes JIT-compiled methods but also various dynamically-generated infrastructure parts like the dynamically created interpreter,
adaptors, and jump tables for virtual dispatch (see vtable
and itable
entries). The agent creates a /tmp/perf-<pid>.map
file which
it fills with one line per code blob that maps a memory location to a code blob name.
The Java application takes the PID of a Java process as an argument and an arbitrary number of additional arguments which it passes to the agent. It then attaches to the target process and instructs it to load the agent library.
The bin
directory contains a set of shell scripts to combine common perf
operations with creating the map file. The scripts will
use sudo
to call perf
scripts.
create-java-perf-map.sh <pid> <options*>
takes a PID and options. It knows where to find libraries relative to thebin
directory.perf-java-top <pid> <perf-top-options>
takes a PID and additional options to pass toperf top
. Uses the agent to create a new/tmp/perf-<pid>.map
and then callsperf top
with the given options.perf-java-record-stack <pid> <perf-record-options>
takes a PID and additional options to pass toperf record
. Runsperf record -g -p <pid> <perf-record-options>
to collect performance data including stack traces. Afterwards it uses the agent to create a new/tmp/perf-<pid>.map
file.perf-java-report-stack <pid> <perf-record-options>
calls firstperf-java-record-stack <pid> <perf-record-options>
and then runsperf report
to directly analyze the captured data. You can callperf report -i /tmp/perf-<pid>.data
again with any options after the script has exited to further analyze the data from the previous run.perf-java-flames <pid> <perf-record-options>
collects data withperf-java-record-stack
and then creates a visualization using @brendangregg's FlameGraph tools. To get meaningful stacktraces spanning several JIT-compiled methods, you need to run your JVM with-XX:+PreserveFramePointer
(which is available starting from JDK8 update 60 build 19) as detailed in ag netflix blog entry.create-links-in <targetdir>
will install symbolic links to the above scripts into<targetdir>
.
Environment variables:
PERF_MAP_OPTIONS
: a string of additional options to pass to the agent as described below.PERF_RECORD_SECONDS
: the number of seconds,perf-java-report-stack
and similar tools will record performance dataPERF_RECORD_FREQ
: the sampling frequence as passed toperf record -F
FLAMEGRAPH_DIR
: the directory into which @brendangregg's FlameGraph has been checked outPERF_JAVA_TMP
: the directory to put temporary files in, the default is/tmp
PERF_DATA_FILE
: the file name whereperf-java-record-stack
will output performance data into, the default is$PERF_JAVA_TMP/perf-<pid>.data
PERF_FLAME_OUTPUT
: the file name to which the flamegraph SVG will be written, the default isflamegraph-<pid>.svg
PERF_FLAME_OPTS
: options to pass to flamegraph.pl (found in FLAMEGRAPH_DIR), the default is--color java
You can add a comma separated list of options to perf-java
(or the AttachOnce
runner). These options are currently supported:
unfold
: Create extra entries for every codeblock inside a method that was inlined from elsewhere (named <inlined_method> in <root_method>). Be aware of the effects of 'skid' in relation with unfolding. See the section below. Also, see the below section about inaccurate inlining information.unfoldall
: Similar tounfold
but will include the complete inlined stack at a code location in the formroot_method->inlined method 1->inlined method 2->...->inlined method on top
.unfoldsimple
: similar tounfold
, however, the extra entries do not include the " in <root_method>" partmsig
: include full method signature in the name stringdottedclass
: convert class signature (Ljava/lang/Class;
) to the usual class names with segments separated by dots (java.lang.Class
). NOTE: this currently breaks coloring when used in combination with flamegraphs.sourcepos
: Adds the name of the source file and the line number on which it is declared for each method. Useful when profiling Scala applications that crate a lot of synthetic classes and methods. Does not work with native methods.
You should be aware that instruction level profiling is not absolutely accurate but suffers from 'skid'. 'skid' means that the actual instruction pointer may already have moved a bit further when a sample is recorded. In that case, (possibly hot) code is reported at an address shortly after the actual hot instruction. See this sample from one of Brendan's presentations demonstrating this issue.
If using unfold
, perf-map-agent will report sections that contain code inlined from other methods as separate entries.
Unfolded entries can be quite short, e.g. an inlined getter may only consist of a few instructions that now lives inside of another
method's JITed code. The next few instructions may then already belong to another entry. In such a case, it is more likely that skid
will not only affect the instruction pointer inside of a method entry but may affect which entry is chosen in the first place.
Skid that occurs inside a method is only visible when analyzing the actual assembler code (as with perf annotate
). Skid that
affects the actual symbol resolution to choose a wrong entry will be much more visible as wrong entries will be reported with
tools that operate on the symbol level like the standard views of perf report
, perf top
, or in flame graphs.
So, while it is tempting to enable unfolded entries for the perceived extra resolution, this extra information is sometimes just noise which will not only clutter the overall view but may also be misleading or wrong.
Hotspot does not retain line number and other debug information for inlined code at other places than safepoints. This
makes sense because you don't usually observe code running between safepoints from the JVM's perspective. This is different
when observing a process from the outside like with perf
. For observed code locations outside of safepoints, the JVM will
not report any inlining information and perf-map-agent will assign those areas to the host method of the inlining.
For more fidelity, Hotspot can be instructed to include debug information for non-safepoints as well. Use
-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
when running the target process. Note, however, that this will
produce a lot more information with the generated perf-<pid>.map
file potentially growing to MBs of size.
Unloading or reloading of a changed agent library is not supported by the JVM (but re-attaching is). Therefore, if you make changes to the agent and recompile it you need to restart a target process that has an older version loaded to use the newer version.
I'm not a professional C code writer. The code is very "experimental", and it is e.g. missing checks for error conditions etc.. Use it at your own risk. You have been warned!
This library is licensed under GPLv2. See the LICENSE file.