Introduction

Default Version Support: ROS 2 Foxy, Fast-DDS 2.0.x. For older ROS 2 versions, seach for the latest corresponding tag (for example, dashing and eloquent).

This performance test tool allows you to test performance and latency of various communication means like ROS 2, Apex.OS WaitSet, FastDDS, RTI Connext DDS, Connext DDS Micro, Eclipse Cyclone DDS and OpenDDS.

It can be extended to other communication frameworks easily.

A detailed description can be found here: Design Article

Building and running performance test

Installing dependencies

ROS 2: https://index.ros.org/doc/ros2/Installation

Additional dependencies are Java and others declared in the package.xml file

sudo apt-get install default-jre
rosdep install -y --from performance_test --ignore-src

How to build

source <ros2_install_path>/setup.bash
mkdir -p perf_test_ws/src
cd perf_test_ws/src
git clone https://gitlab.com/ApexAI/performance_test.git
cd ..
colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release
source install/setup.bash

How to run a single experiment

After building, a simple experiment can be run using the following.

Before you start create a directory for the output.

mkdir experiment
cd experiment
./install/performance_test/lib/performance_test/perf_test -c ROS2 -l log -t Array1k --max_runtime 30

At the end of the experiment, a CSV log file will be generated in the experiment folder (e.g. experiment/log_Array1k_<current_date>

Generating graphical plots

Plot results

To plot the results, you will need to install the perfplot tool from the apex_performance_plotter python module. See apex_performance_plotter for the list of dependecies.

pip3 install performance_test/helper_scripts/apex_performance_plotter

This tool will convert performance test log files into PDFs containing graphs of the results.

Note: Some of the dependencies of apex_performance_plotter (specifically pandas, at the time of writing) require python 3.6. It is possible to get apex_performance_plotter working with older dependencies that run with python 3.5, but that is beyond the scope of this document.

In order to have the log-file plotted into a PDF file, specify the log file name after the plotter's tool executable:

perfplot <logfile_name>

This will generate a PDF file <logfile_name>.pdf that can be viewed in any PDF viewer.

Configuration options provided by the tool

The tool has a fully documented command line interface which can be accessed by typing performance_test --help.

~/perf_test_ws$ ./install/performance_test/lib/performance_test/perf_test --help

Allowed options:
  -h [ --help ]                        Print usage message.
  -l [ --logfile ] arg                 Optionally specify a logfile.
  -r [ --rate ] arg (=1000)            The rate data should be published.
                                       Defaults to 1000 Hz. 0 means publish as
                                       fast as possible.
  -c [ --communication ] arg           Communication plugin to use (ROS2,
                                       FastRTPS, ConnextDDS, ConnextDDSMicro,
                                       CycloneDDS, iceoryx, OpenDDS,
                                       ROS2PollingSubscription)
  -t [ --topic ] arg                   Specify a topic name to use. Only the
                                       pub/sub with the same topic name can
                                       communicate with each other.
  --msg arg                            Msg to use. Use --msg_list to get a
                                       list.
  --msg_list                           Prints list of available msg types and
                                       exits.
  --dds_domain_id arg (=0)             Sets the DDS domain id.
  --reliable                           Enable reliable QOS. Default is best
                                       effort.
  --transient                          Enable transient QOS. Default is
                                       volatile.
  --keep_last                          Enable keep last QOS. Default is keep
                                       all.
  --history_depth arg (=1000)          Set history depth QOS. Defaults to 1000.
  --disable_async                      Disables async. pub/sub.
  --max_runtime arg (=0)               Maximum number of seconds to run before
                                       exiting. Default (0) is to run forever.
  -p [ --num_pub_threads ] arg (=1)    Maximum number of publisher threads.
  -s [ --num_sub_threads ] arg (=1)    Maximum number of subscriber threads.
  --check_memory                       Prints backtrace of all memory
                                       operations performed by the middleware.
                                       This will slow down the application!
  --use_rt_prio arg (=0)               Set RT priority. Only certain platforms
                                       (i.e. Drive PX) have the right
                                       configuration to support this.
  --use_rt_cpus arg (=0)               Set RT cpu affinity mask. Only certain
                                       platforms (i.e. Drive PX) have the right
                                       configuration to support this.
  --use_single_participant             Uses only one participant per process.
                                       By default every thread has its own.
  --zero_copy                          Use zero copy transfer.
  --with_security                      Make nodes with deterministic names for
                                       use with security
  --roundtrip_mode arg (=None)         Selects the round trip mode (None, Main,
                                       Relay).
  --ignore arg (=0)                    Ignores first n seconds of the
                                       experiment.
  --disable_logging                    Disables experiment logging to stdout.
  --expected_num_pubs arg (=0)         Expected number of publishers for
                                       wait_for_matched
  --expected_num_subs arg (=0)         Expected number of subscribers for
                                       wait_for_matched
  --wait_for_matched_timeout arg (=30) Maximum time[s] to wait for matching
                                       publishers/subscribers. Defaults to 30s

Some things to note:

--use_single_participant option also should not be used as its obsolete and will be removed soon.

Implemented plugins

The performance test tool can measure the performance of a variety of communication middlewares from different vendors. In this case there is no rclcpp or rmw layer overhead over the publisher and subscriber routines. The following plugins are currently implemented:

RAW Plugin	Supported subscription	Supported transports	`--cmake-args` to pass when building performance_test	Communication mean (-c) to pass when running experiments	Supports zero copy?
FastDDS 2.0.x	Native DDS Code	UDP	`-DPERFORMANCE_TEST_FASTRTPS_ENABLED=ON`	FastRTPS	No
RTI Connext DDS 5.3.1+ ¹	Native DDS Code	SHMEM, UDP	`-DPERFORMANCE_TEST_CONNEXTDDS_ENABLED=ON`	ConnextDDS	No
Connext DDS Micro 3.0.2	Native DDS Code	INTRA, SHMEM	`-DPERFORMANCE_TEST_CONNEXTDDSMICRO_ENABLED=ON`	ConnextDDSMicro	Yes
Eclipse Cyclone DDS	Native DDS Code	UDP	`-DPERFORMANCE_TEST_CYCLONEDDS_ENABLED=ON`	CycloneDDS	No
OpenDDS 3.13.2	Native DDS Code	UDP	`-DPERFORMANCE_TEST_OPENDDS_ENABLED=ON`	OpenDDS	No
iceoryx 1.0	iceoryx Posh subscriber	SHMEM	`-DPERFORMANCE_TEST_ICEORYX_ENABLED=ON`	iceoryx	Yes

¹ NOTE: you need to source an RTI Connext DDS environment: if RTI Connext DDS was installed with ROS 2 (Linux only):
source /opt/rti.com/rti_connext_dds-5.3.1/setenv_ros2rti.bash
If RTI Connext DDS is installed separately, you can source the following script to set the environment:
source <connextdds_install_path>/resource/scripts/rtisetenv_<arch>.bash

² NOTE: The iceoryx plugin is not a DDS implementation. The DDS-specific options (such as domain ID, durability, and reliability) do not apply. For the iceoryx plugin, RouDi must be running.

If you want to use any of these supported plugins, please refer to the table above for the CMAKE arguments to provide while building the tool and specify the appropriate Communication Mean (-c option) when running the experiment.

For example, to run a performance test with the ConnextMicro plugin, build performance_test with the following command:

colcon build --cmake-clean-cache --cmake-args -DCMAKE_BUILD_TYPE=Release -DPERFORMANCE_TEST_CONNEXTDDSMICRO_ENABLED=ON

Now to run the performance test with ConnextDDSMicro plugin :

./install/performance_test/lib/performance_test/perf_test -c ConnextDDSMicro -l log --msg Array1k -t test_topic --max_runtime 10

Supported rmw implementations

The performance_test tool can also measure performance of the application with the ROS 2 layers. For example the following configuration can be tested: RTI Connext Micro + ROS2PollingSubscription rclcpp + rmw_apex_dds. Performance_test tool supports ROS 2 Foxy version.

The following plugins with a ROS middleware interface are currently supported:

RMW Implementation	Supported subscription	Supported transports	`--cmake-args` to pass when building performance_test	Communication mean (-c) to pass when running experiments
rmw_fastrtps_cpp	ROS 2 Callback, Apex.OS WaitSet	UDP	Nothing for ROS 2 Callback (`-DPERFORMANCE_TEST_CALLBACK_EXECUTOR_ENABLED=ON` is set by default) `-DPERFORMANCE_TEST_POLLING_SUBSCRIPTION_ENABLED=ON` (for using Apex.OS waitsets)	ROS2 ROS2PollingSubscription (for using Apex.OS waitsets)
rmw_apex_dds (Apex.AI proprietary rmw implementation)	ROS 2 Callback, Apex.OS WaitSet	INTRA, SHMEM	Nothing for ROS 2 Callback (`-DPERFORMANCE_TEST_CALLBACK_EXECUTOR_ENABLED=ON` is set by default) `-DPERFORMANCE_TEST_POLLING_SUBSCRIPTION_ENABLED=ON` (for using Apex.OS waitsets)	ROS2 ROS2PollingSubscription (for using Apex.OS waitsets)
rmw_connext_cpp	ROS 2 Callback	SHMEM, UDP	Nothing for ROS 2 Callback (`-DPERFORMANCE_TEST_CALLBACK_EXECUTOR_ENABLED=ON` is set by default)	ROS2
rmw_cyclonedds_cpp	ROS 2 Callback	UDP	Nothing for ROS 2 Callback (`-DPERFORMANCE_TEST_CALLBACK_EXECUTOR_ENABLED=ON` is set by default)	ROS2

Note:

The DDS implementation that Apex.OS has been compiled with (rmw_apex_dds or rmw_cyclone_dds) is automatically linked when the performance_test tool is built with Apex.OS.

The ROS2PollingSubscription option only works if Apex.OS is present.

Apex.OS Cert does not support the ROS 2 Callback communicator. When building with Apex.OS Cert, you must explicitly disable the ROS 2 Callback communicator by setting -PERFORMANCE_TEST_CALLBACK_EXECUTOR_ENABLED=OFF.

Zero copy transfer

The performance_test tool can also measure the performance of an application that uses zero copy transfer. With zero copy transfer, the publisher requests a loan from a pre-allocated shared memory pool, where it writes the sample. The subscriber reads the sample from that same location. To enable the zero copy features in this tool, add the CMake arg:

colcon build --cmake-clean-cache --cmake-args -DPERFORMANCE_TEST_ZERO_COPY_ENABLED=ON

Zero copy transfer is an Inter-Process Communication mechanism. When running, use the --zero_copy argument for both the publisher and subscriber processes:

./install/performance_test/lib/performance_test/perf_test -c ROS2PollingSubscription --msg Array1k -t test_topic --max_runtime 30 -p 1 -s 0 --zero_copy &
./install/performance_test/lib/performance_test/perf_test -c ROS2PollingSubscription --msg Array1k -t test_topic --max_runtime 30 -p 0 -s 1 --zero_copy

Not all of the native DDS plugins support zero copy transfer. The Implemented Plugins Table indicates which plugins support zero copy transfer.

This tool also supports zero copy transfer for RMW implementations, with the ROS2PollingSubscription communication mean, via the rclcpp::Publisher::borrow_loaned_message API. You can read more about loaned message in ROS2 here.

Batch run experiments (for advanced users)

Multiple experiments can be run using this python script:

python3 src/performance_test/performance_test/helper_scripts/run_experiment.py

You need to edit the python script to call the performance test tool with the desired configurations.

Running experiments intraprocess vs running experiments interprocess

The tool offers to run the experiments either in Intraprocess composition which means the publisher and subscriber threads are in the same process or Inter process composition which requires the publisher and subscriber to be in different processes. This is very useful if you want to test the performance of different transports like Micro INTRA, UDP and SHMEM.

Let's take an example of a single publisher and single subscriber:

./install/performance_test/lib/performance_test/perf_test -c ROS2 -l log --msg Array1k -t test_topic --max_runtime 30 --num_sub_threads 1 --num_pub_threads 1

which is same as running by default:

./install/performance_test/lib/performance_test/perf_test -c ROS2 -l log --msg Array1k -t test_topic --max_runtime 30

This is example of running the experiments in Intraprocess composition. Connext Micro as per Apex.OS, is configured to use Micro INTRA in this setting. FastDDS and other supported DDS implementations use UDP by default.

To run the experiments in different processes, the subscriber and publisher processes we can run the tool twice simultaneously. Run the first instance of the tool like :

./install/performance_test/lib/performance_test/perf_test -c ROS2 -l log --msg Array1k -t test_topic --max_runtime 30 --num_sub_threads 0 --num_pub_threads 1

This is the publisher process. Now to run the subscriber open a second window in the terminal and run a second instance of the tool like:

./install/performance_test/lib/performance_test/perf_test -c ROS2 -l log --msg Array1k -t test_topic --max_runtime 30 --num_sub_threads 1 --num_pub_threads 0

This is the subscriber process. The tool supports multiple subscribers to be run at once. So you can configure the value of --num_sub_threads in the subscriber process to be more than one also.

This is an example of running the experiments in Interprocess composition. Connext Micro as per Apex.OS, is configured to use SHMEM in this setting. FastDDS and other supported DDS implementations use UDP by default.

Note: In Inter process composition the CPU and Resident Memory measurements are logged separately for the publisher and subscriber processes.

Relay mode

Testing latency between multiple machines is difficult as it is hard precisely synchronize clocks between them. To overcome this issue performance test supports relay mode which allows for a round-trip style of communication.

On the main machine: ./install/performance_test/lib/performance_test/perf_test -c ROS2 --msg Array1k -t test_topic --roundtrip_mode Main On the relay machine: ./install/performance_test/lib/performance_test/perf_test -c ROS2 --msg Array1k -t test_topic --roundtrip_mode Relay

Note: On the main machine the round trip latency is reported and will be roughly double the latency compared to the latency reported in non-relay mode.

Save results to a SQL database

The tool also gives you the ability to persist the performance test results in a SQL compatible database.

See Add SQL support readme for instructions and implementation details.

Memory analysis

You can use OSRF memory tools to find memory allocations in your application. To enable it you need to do the following steps, assuming you already did compile performance test before:

Enter your work space: cd perf_test_ws/src
Clone OSRF memory memory tools: git clone https://github.com/osrf/osrf_testing_tools_cpp.git
Build everything cd .. && colcon build --cmake-args -DCMAKE_BUILD_TYPE=Release
You need to preload the memory library to make diagnostics work: export LD_PRELOAD=$(pwd)/install/osrf_testing_tools_cpp/lib/libmemory_tools_interpose.so
Run with memory check enabled: ./install/performance_test/lib/performance_test/perf_test -c ROS2 -l log ---msg Array1k -t test_topic --max_runtime 10 --check_memory

Note: Enabling this feature will cause a huge performance impact.

Custom environment data

You can set the APEX_PERFORMANCE_TEST environment variable before running performance test to add custom data to the output CSV file. This information will then also be visible in the files outputted by the plotter script. Please use the JSON format to pass the values.

Example:

export APEX_PERFORMANCE_TEST="
{
\"My Version\": \"1.0.4\",
\"My Image Version\": \"5.2\",
\"My OS Version\": \"Ubuntu 16.04\"
}
"
./install/performance_test/lib/performance_test/perf_test -c ROS2 --msg Array1k -t test_topic

Troubleshooting

When running performance test it prints for example the following error : ERROR: You must compile with FastRTPS support to enable FastDDS as communication mean.

This means that the performance test needs to be compiled with --cmake-args -DPERFORMANCE_TEST_FASTRTPS_ENABLED=ON to switch from ROS 2 to FastDDS.

Literature

We have attempted to write a white paper with the goal of explaining how to do a fair and unbiased performance testing based on the performance testing framework that we built at Apex.AI and the experience that we gathered in the past 1.5 years. Here is a link to the paper.

Blast545/performance_test