gProfiler combines multiple sampling profilers to produce unified visualization of what your CPU is spending time on, displaying stack traces of your processes across native programs1 (includes Golang), Java and Python runtimes, and kernel routines.
gProfiler can upload its results to the Granulate Performance Studio, which aggregates the results from different instances over different periods of time and can give you a holistic view of what is happening on your entire cluster. To upload results, you will have to register and generate a token on the website.
gProfiler runs on Linux (on x86_64 and Aarch64; Aarch64 support is not complete yet and not all runtime profilers are supported, see architecture support).
For installation methods, jump to run as...
This section describes the possible options to control gProfiler's behavior.
gProfiler can produce output in two ways:
-
Create an aggregated, collapsed stack samples file (
profile_<timestamp>.col
) and a flamegraph file (profile_<timestamp>.html
). Two symbolic links (last_profile.col
andlast_flamegraph.html
) always point to the last output files.Use the
--output-dir
/-o
option to specify the output directory.If
--rotating-output
is given, only the last results are kept (available vialast_profle.col
andlast_flamegraph.html
). This can be used to avoid increasing gProfiler's disk usage over time. Useful in conjunction with--upload-results
(explained ahead) - historical results are available in the Granulate Performance Studio, and the very latest results are available locally.--no-flamegraph
can be given to avoid generation of theprofile_<timestamp>.html
file - only the collapsed stack samples file will be created. -
Send the results to the Granulate Performance Studio for viewing online with filtering, insights, and more.
Use the
--upload-results
/-u
flag. Pass the--token
option to specify the token provided by Granulate Performance Studio, and the--service-name
option to specify an identifier for the collected profiles, as will be viewed in the Granulate Performance Studio. Profiles sent from numerous gProfilers using the same service name will be aggregated together.
Note: both flags can be used simultaneously, in which case gProfiler will create the local files and upload the results.
--profiling-frequency
: The sampling frequency of the profiling, in hertz.--profiling-duration
: The duration of the each profiling session, in seconds.
The default profiling frequency is 11 hertz. Using higher frequency will lead to more accurate results, but will create greater overhead on the profiled system & programs.
For each profiling session (each profiling duration), gProfiler produces outputs (writing local files and/or uploading the results to the Granulate Performance Studio).
--no-java
or--java-mode disabled
: Disable profilers for Java.--no-java-async-profiler-buildids
: Disable embedding of buildid+offset in async-profiler native frames (used when debug symbols are unavailable).
--no-python
: Alias of--python-mode disabled
.--python-mode
: Controls which profiler is used for Python.auto
- (default) try with PyPerf (eBPF), fall back to py-spy.pyperf
- Use PyPerf with no py-spy fallback.pyspy
/py-spy
- Use py-spy.disabled
- Disable profilers for Python.
Profiling using eBPF incurs lower overhead & provides kernel & native stacks.
--php-mode phpspy
: Enable PHP profiling with phpspy.--no-php
or--php-mode disabled
: Disable profilers for PHP.--php-proc-filter
: Process filter (pgrep
) to select PHP processes for profiling (this is phpspy's-P
option)
--no-ruby
or--ruby-mode disabled
: Disable profilers for Ruby.
--nodejs-mode
: Controls which profiler is used for NodeJS.none
- (default) no profiler is used.perf
- augment the system profiler (perf
) results with jitdump files generated by NodeJS. This requires running yournode
processes with--perf-prof
(and for Node >= 10, with--interpreted-frames-native-stack
). See this NodeJS page for more information.attach-maps
- Generates perf map using node-linux-perf module. This module is injected at runtime. Requires entrypoint of application to be CommonJS script. (Doesn't work for ES modules)
--perf-mode
: Controls the global perf strategy. Must be one of the following options:fp
- Use Frame Pointers for the call graphdwarf
- Use DWARF for the call graph (adds the--call-graph dwarf
argument to theperf
command)smart
- Run bothfp
anddwarf
, then choose the result with the highest average of stack frames count, per process. This is the default.disabled
- Avoids runningperf
at all. See perf-less mode.
gProfiler uses the Python requests
package, which works with standard HTTP proxies environment, e.g https_proxy
or HTTPS_PROXY
(note - https and not http).
If running gProfiler as an executable and using sudo
, make sure to run sudo -E
if you have the environment variable defined (otherwise, sudo
will forget it). Alternatively, you can run sudo https_proxy=my-proxy /path/to/gprofiler ...
.
If running gProfiler as a Docker container, make sure to add -e https_proxy=my-proxy
to the docker run
command line (the spawned container does not inherit your set of environment variables, you have to pass it manually).
If you still get connection errors, make sure the proxy is indeed used by the profiler - in the Failed to connect to server
error message you'll see the proxy used by the profiler (under Proxy used:
).
By default, gProfiler sends logs to Granulate Performance Studio (when using --upload-results
/-u
flag)
This behavior can be disabled by passing --dont-send-logs
or the setting environment variable GPROFILER_DONT_SEND_LOGS=1
.
By default, gProfiler agent sends system metrics (CPU and RAM usage) and metadata to the Performance Studio.
The metadata includes system metadata like the kernel version and CPU count, and cloud metadata like the type of the instance you are running on.
The metrics collection will not be enabled if the --upload-results
/-u
flag is not set.
Otherwise, you can disable metrics and metadata by using the following parameters:
- Use
--disable-metrics-collection
to disable metrics collection - Use
--disable-metadata-collection
to disable metadata collection
gProfiler can be run in a continuous mode, profiling periodically, using the --continuous
/-c
flag.
Note that when using --continuous
with --output-dir
, a new file will be created during each sampling interval.
Aggregations are only available when uploading to the Granulate Performance Studio.
This section lists the various execution modes for gProfiler (as a container, as an executable, etc...).
Run the following to have gProfiler running continuously, uploading to Granulate Performance Studio:
docker pull granulate/gprofiler:latest
docker run --name granulate-gprofiler -d --restart=on-failure:10 \
--pid=host --userns=host --privileged \
granulate/gprofiler:latest -cu --token="<TOKEN>" --service-name="<SERVICE NAME>" [options]
First, check if gProfiler is already running - run pgrep gprofiler
. You should not see any output, if you do see any PIDs it means that gProfiler is running and it must be stopped before starting it again (you can stop it with sudo pkill -TERM gprofiler
).
Run the following to have gprofiler running continuously, in the background, uploading to Granulate Performance Studio:
wget https://github.com/Granulate/gprofiler/releases/latest/download/gprofiler_$(uname -m) -O gprofiler
sudo chmod +x gprofiler
sudo TMPDIR=/proc/self/cwd sh -c "setsid ./gprofiler -cu --token=\"<TOKEN>\" --service-name=\"<SERVICE NAME>\" [options] > /dev/null 2>&1 &"
sleep 1
pgrep gprofiler # make sure gprofiler has started
If the pgrep
doesn't find any process, try running without > /dev/null 2>&1 &
so you can inspect the output, and look for errors.
For non-daemon mode runes, you can remove the setsid
and > /dev/null 2>&1 &
parts.
The logs can then be viewed in their default location (/var/log/gprofiler
).
TMPDIR
is added because gProfiler unpacks executables to /tmp
by default; this is done by staticx
. For cases where /tmp
is marked with noexec
, we add TMPDIR=/proc/self/cwd
to have everything unpacked in your current working directory, which is surely executable before gProfiler was started in it.
The following platforms are currently not supported with the gProfiler executable:
- Alpine
Remark: container-based execution works and can be used in those cases.
You can generate a systemd service configuration that runs gProfiler as an executable (and therefore, bears the same known issues) by running:
curl -s https://raw.githubusercontent.com/Granulate/gprofiler/master/deploy/systemd/create_systemd_service.sh | GPROFILER_TOKEN=<TOKEN> GPROFILER_SERVICE=<SERVICE_NAME> bash
This script generates granulate-gprofiler.service
in your working directory, and you can go ahead and install it by:
systemctl enable $(pwd)/granulate-gprofiler.service
systemctl start granulate-gprofiler.service
For Databricks, the same installation instructions as specified in the running as an executable section can be used (make sure to run them in the initialization script of your node).
Additionally, 2 more flags need to be added to gProfiler's commandline: --disable-pidns-check --perf-mode=none
. You can add them right after the --service-name
argument.
--disable-pidns-check
is required because gProfiler won't run in the init PID NS.--perf-mode=none
is required because gProfiler will not have permissions to run system-wideperf
, so we will profile only runtime processes, such as Java. See perf-less mode for more information.
See gprofiler.yaml for a basic template of a DaemonSet running gProfiler.
Make sure to insert the GPROFILER_TOKEN
and GPROFILER_SERVICE
variables in the appropriate location!
Like with the DaemonSet, make sure to insert the GPROFILER_TOKEN
and GPROFILER_SERVICE
variables in the appropriate location.
cd deploy/k8s/helm-charts
helm install --set gprofiler.token="GPROFILER_TOKEN" --set gprofiler.serviceName="GPROFILER_SERVICE" gprofiler .
# To view additional configuration options you can run:
helm show values .
- Go to ECS, and create a new task definition
- Choose EC2, and click
Next Step
- Scroll to the bottom of the page, and click
Configure via JSON
- Replace the JSON contents with the contents of the gprofiler_task_definition.json file and Make sure you change the following values:
- Replace
<TOKEN>
in the command line with your token you got from the gProfiler Performance Studio site. - Replace
<SERVICE NAME>
in the command line with the service name you wish to use.
- Replace
- Note - if you wish to see the logs from the gProfiler service, be sure to follow AWS's guide on how to auto-configure logging, or to set it up manually yourself.
- Click
Save
- Click
Create
- Go to your ECS Clusters and enter the relevant cluster
- Click on
Services
, and chooseCreate
- Choose the
EC2
launch type and thegranulate-gprofiler
task definition with the latest revision - Enter a service name
- Choose the
DAEMON
service type - Click
Next step
until you reach theReview
page, and then clickCreate Service
At the time of this writing, Fargate does not support DAEMON
tasks (see this tracking issue).
Furthermore, Fargate does not allow using "pidMode": "host"
in the task definition (see documentation of pidMode
here). Host PID is required for gProfiler to be able to profile processes running in other containers (in case of Fargate, other containers under the same containerDefinition
).
So in order to deploy gProfiler, we need to modify a container definition to include running gProfiler alongside the actual application. This can be done with the following steps:
-
Modify the
command
&entryPoint
parameters of your entry in thecontainerDefinitions
array. The new command should include downloading of gProfiler & executing it in the background, andentryPoint
will be["/bin/bash"]
.For example, if your default
command
is["python", "/path/to/my/app.py"]
, we will now change it to:["-c", "(wget https://github.com/Granulate/gprofiler/releases/latest/download/gprofiler -O /tmp/gprofiler; chmod +x /tmp/gprofiler; /tmp/gprofiler -cu --token=<TOKEN> --service-name=<SERVICE NAME> --disable-pidns-check --perf-mode none) & python /path/to/my/app.py"]
.Make sure to:
- Replace
<TOKEN>
in the command line with your token you got from the gProfiler Performance Studio site. - Replace
<SERVICE NAME>
in the command line with the service name you wish to use.
This new command will start the downloading of gProfiler in the background, then run your application. Make sure to JSON-escape any characters in your command line! For example,
"
are replaced with\"
.Additionally, we will set
entryPoint
to["/bin/bash"]
. If you had usedentryPoint
prior to incorporating gProfiler, make sure to use it in the newcommand
.About
--disable-pidns-check
and--perf-mode none
- please see the explanation in running-on-databricks, as it applies here as well.gProfiler and its installation process will send the outputs to your container's stdout & stderr. After verifying that everything works, you can append
> /dev/null 2>&1
to the gProfiler command parenthesis (in this example, before the& python ...
) to prevent it from spamming your container logs.This requires your image to have
wget
installed - you can make surewget
is installed, or substitute thewget
command withcurl -SL https://github.com/Granulate/gprofiler/releases/latest/download/gprofiler --output /tmp/gprofiler
, or any other HTTP-downloader you wish. - Replace
-
Add
linuxParameters
to the container definition (this goes directly in your entry incontainerDefinitinos
):"linuxParameters": { "capabilities": { "add": [ "SYS_PTRACE" ], }, },
SYS_PTRACE
is required by various profilers, and Fargate by default denies it for containers.
Alternatively, you can download gProfiler in your Dockerfile
to avoid having to download it every time in run-time. Then you just need to invoke it upon container start-up.
You can run a gProfiler container with docker-compose
by using the template file in docker-compose.yml.
Start by replacing the <TOKEN>
and <SERVICE NAME>
with values in the command
section -
<TOKEN>
should be replaced with your personal token from the gProfiler Performance Studio site (in the Install Service section)- The
<SERVICE NAME>
should be replaced with whatever service name you wish to use
Optionally, you can add more command line arguments to the command
section. For example, if you wish to use the py-spy
profiler, you can add --python-mode pyspy
to the commandline.
To run it, run the following command:
docker-compose -f /path/to/docker-compose.yml up -d
To run gProfiler on your cluster, you will need to add an initialization action that will install the agent on all of your workers when the cluster is created.
First, upload the gProfiler initialization action script file to your Google Cloud Storage bucket -
gsutil cp gprofiler_initialization_action.sh gs://<YOUR BUCKET>
If you don't have a Google Storage bucket, make sure you create one (documentation).
Then, create your Dataproc cluster with the --initialization-actions
flag -
export TOKEN='<TOKEN>' && \
export SERVICE='<SERVICE NAME>' && \
gcloud dataproc clusters create <CLUSTER NAME> \
--initialization-actions gs://<YOUR BUCKET>/gprofiler_initialization_action.sh \
--metadata gprofiler-token="$TOKEN",gprofiler-service="$SERVICE",enable-stdout="1" --region <REGION>
Note - make sure to replace the placeholders with the appropriate values -
- Replace
<TOKEN>
in the command line with your token you got from the gProfiler Performance Studio site. - Replace
<SERVICE NAME>
in the command line with the service name you wish to use. - Replace
<YOUR BUCKET>
with the bucket name you have uploaded the gProfiler initialization action script to. - Replace
<CLUSTER NAME>
with the cluster name you wish to use - Replace
<REGION>
with the region you wish to use
If you are experiencing issues with your gProfiler installation (such as no flamegraphs available in the Performance Studio
after waiting for more than 1 hour) you can look at gProfiler's logs and see if there are any errors.
To see gProfiler's logs, you must enable its output by providing enable-stdout="1"
in the cluster metadata when creating the Dataproc cluster. You can use the example above.
Wait at least 10 minutes after creating your cluster, and then you can SSH into one of your cluster instances via either Dataproc's web interface or the command line.
After connecting to your instance, run the following command:
tail -f /var/log/dataproc-initialization-script-0.log
If you have more than one initialization script, try running the command with an increasing number instead of 0
in the command find the appropriate gProfiler log file.
By default, gProfiler's output is written to the Dataproc initialization script output file (/var/log/dataproc-initialization-script-{Incrementing number}.log
).
If you wish to disable this behaviour, change the enable-stdout
metadata variable value to "0" (the default is "1").
To run gProfiler on your AWS EMR cluster, you should create a bootstrap action that will launch the gProfiler on each node upon bootstrap. The full process should be:
- Create a bootstrap script (bash script) that will launch the gProfiler upon bootstrap, the script can look like this:
#!/bin/bash
wget https://github.com/Granulate/gprofiler/releases/latest/download/gprofiler_$(uname -m) -O gprofiler
sudo chmod +x gprofiler
sudo sh -c "setsid ./gprofiler -cu --token=\"<TOKEN>\" --service-name=\"<SERVICE NAME>\" > /dev/null 2>&1 &"
Make sure to:
- Replace
<TOKEN>
with your token you got from the gProfiler Performance Studio site. - Replace
<SERVICE>
with the service name you wish to use.
- Upload the script to an S3 bucket, for example:
s3://my-s3-bucket/gprofiler-bootstrap.sh
- Create the EMR cluster with bootstrap-action to run the bootstrap script, this can be done both from the AWS Console and AWS CLI.
- AWS Console Example:
- Create an EMR Cluster from the AWS Console
- Select
Go to advanced options
- Proceed to Step 3 - General Cluster Settings
- Expand Bootstrap Actions section
- Add Action of type Custom Action
- Fill in script location with the appropriate script location on your S3, for example
s3://my-s3-bucket/gprofiler-bootstrap.sh
- AWS CLI Example:
aws emr create-cluster --name MY-Cluster ... --bootstrap-actions "Path=s3://my-s3-bucket/gprofiler-bootstrap.sh"
- AWS Console Example:
Download the playbook and run it this way:
ansible-playbook -i ... gprofiler_playbook.yml --extra-vars "gprofiler_token='<TOKEN>'" --extra-vars "gprofiler_service='<SERVICE NAME>'"
Note - the playbook defaults to hosts: all
, make sure to modify the pattern to your liking before running.
The playbook defines 2 more variables:
gprofiler_path
- path to download gProfiler to,/tmp/gprofiler
by default.gprofiler_args
- additional arguments to pass to gProfiler, empty by default. You can use it to pass, for example,'--profiling-frequency 15'
to change the frequency.
gProfiler requires Python 3.6+ to run.
pip3 install -r requirements.txt
./scripts/copy_resources_from_image.sh
Then, run the following as root:
python3 -m gprofiler [options]
gProfiler invokes perf
in system wide mode, collecting profiling data for all running processes.
Alongside perf
, gProfiler invokes runtime-specific profilers for processes based on these environments:
- Java runtimes (version 7+) based on the HotSpot JVM, including the Oracle JDK and other builds of OpenJDK like AdoptOpenJDK and Azul Zulu.
- Uses async-profiler.
- The CPython interpreter, versions 2.7 and 3.5-3.10.
- eBPF profiling (based on PyPerf) requires Linux 4.14 or higher; see Python profiling options for more info.
- If eBPF is not available for whatever reason, py-spy is used.
- PHP (Zend Engine), versions 7.0-8.0.
- Uses Granulate's fork of the phpspy project.
- Ruby versions (versions 1.9.1 to 3.0.1)
- Uses Granulate's fork of the rbspy profiler.
- NodeJS (version >= 10 for functioning
--perf-prof
):- Uses
perf inject --jit
and NodeJS's ability to generate jitdump files. See NodeJS profiling options. - Can also generate perf maps at runtime.
- Uses
- .NET runtime
- Uses dotnet-trace.
The runtime-specific profilers produce stack traces that include runtime information (i.e, stacks of Java/Python functions), unlike perf
which produces native stacks of the JVM / CPython interpreter.
The runtime stacks are then merged into the data collected by perf
, substituting the native stacks perf
has collected for those processes.
Runtime | x86_64 | Aarch64 |
---|---|---|
perf (native, Golang, ...) | ✔️ | ✔️ |
Java (async-profiler) | ✔️ | ✔️ |
Python (py-spy) | ✔️ | ✔️ |
Python (PyPerf eBPF) | ✔️ | ❌ |
Ruby (rbspy) | ✔️ | ✔️ |
PHP (phpspy) | ✔️ | ✔️ (experimental) |
NodeJS (perf) | ✔️ | ✔️ |
.NET (dotnet-trace) | ✔️ (experimental) | ✔️ (experimental) |
It is possible to run gProfiler without using perf
- this is useful where perf
can't be used, for whatever reason (e.g permissions). This mode is enabled by --perf-mode disabled
.
In this mode, gProfiler uses runtime-specific profilers only, and their results are concatenated (instead of scaled into the results collected by perf
). This means that, although the results from different profilers are viewed on the same graph, they are not necessarily of the same scale: so you can compare the samples count of Java to Java, but not Java to Python.
Note: this process builds from source all of the profilers used by gProfiler. It's recommended to use machines with at least 8 cores and 16 GB of RAM (memory is more important, as your build may fail with OOM if you have less memory available).
- x86_64:
./scripts/build_x86_64_container.sh -t gprofiler
will create a local imagegprofiler
. - Aarch64:
./scripts/build_aarch64_container.sh
, you will need to set up buildx for building cross architecture before, if you're building on x86_64.
- x86_64:
./scripts/build_x86_64_executable.sh
will build the executable intobuild/x86_64/gprofiler
. - Aarch64:
./scripts/build_aarch64_executable.sh
will build the executable intobuild/aarch64/gprofiler
. As with the Aarch64 container build - this can be used to cross-compile on x86_64, you just need to set up buildx for that, see notes above.
We welcome all feedback and suggestion through Github Issues:
- Update
__version__
in__init__.py
. - Create a tag with the same version (after merging the
__version__
update) and push it.
We recommend going through our contribution guide for more details.
- async-profiler by Andrei Pangin. See our fork.
- py-spy by Ben Frederickson. See our fork.
- bcc (for PyPerf) by the IO Visor project. See our fork.
- phpspy by Adam Saponara. See our fork.
- rbspy by the rbspy project. See our fork.
- dotnet-trace
1: To profile native programs that were compiled without frame pointers, make sure you use the --perf-mode smart
(which is the default). Read more about it in the Profiling options section↩