Benchmark - overhead of instrumented code without a tracer
jpkrohling opened this issue · 10 comments
Create a performance test that assess the overhead of tracing with noop against a simple application. Scenarios to measure/test:
- Simple single-threaded Java application with the NoopTracer, creating N spans
- Simple Spring Boot application with a couple of endpoints and a couple of beans, each layer generating N spans. This test should handle several concurrent requests, to exercise a multi-threading scenario
Other ideas are welcome. The main goal is to assess that the hot spots from the module opentracing-api
and opentracing-util
are performing well or highlight the parts where performance could be improved.
For one or more scenarios, JMH might be useful:
http://openjdk.java.net/projects/code-tools/jmh/
This is more a task of instrumentation, and it is hard to understand what numbers are good considering the design of the tracer library impacts efficiency. One way is to compare against non-OT or non-OT-Bridged instrumentation.
For example, I'd expect the OT bridged instrumentation using Brave to be far less efficient vs the native brave instrumentation due to some design problems highlighted over the years. However, a lot of the sources of will be outside this repo, for example practice of walking stack traces routinely in instrumentation projects. So, there will be in one way the side effects of design here and also design in instrumentation.
Regardless, some work here will be helpful, just make sure when you test, you cover base case, unsampled/noop, nominal case and error case scenarios.
and another note, even brave will be far less efficient than native agents like instana for example, which will very very unlikley use OT for things like servlet instrumentation. It would be good for the common public to know the comparison of how something works when there is no requirement to use OT to achieve the goal. For example, instana agent can trace servlet yet still supply a bridge to the OT layer for ad-hoc tracing. In this way only the OT parts will be hot spots. cc @CodingFabian
you could also do a similar comparison with other agents like elasticsearch's as at least that one supports a garbage-free (mostly) design and is OSS cc also @felixbarny
I've been working in the benchmark tests related to this issue.
The source code for the number 1 - single-threaded Java application is located here.
This is a result of an execution these tests.
The tests were executed in a personal notebook with these characteristics:
Model Name: MacBook Pro
Processor Name: Intel Core i5
Processor Speed: 2.6 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 3 MB
Memory: 8 GB
I'll appreciate any feedback to improve this performance tests.
The string concatenations are likely to be eliminated by JIT's dead code analysis. I'd suggest returning the string from the benchmark method so that JMH can properly put them in a black hole. I'm also unsure why you want to benchmark string concatenations. IMO agents/tracers should avoid string concatenations and object allocations as much as possible. Or did you want to test the performance of creating spans? But then I don't quite understand why the benchmark is also about concatenating strings.
Just my 2c :)
Hi @felixbarny Thank for your comments! :)
I want to test the performance of creating spans, and I wanted to start with a simple example.
I modified the benchmarks with your suggestion, and effectively the numbers make more sense now.
Please let me know if you have any better ideas to implement in these set of tests.
I've been improving the benchmark tests:
1 - Single-threaded Java application
Measuring the different ways to concatenate strings and comparing all of these ways but with the aggregation of creating spans with tracers NoopTracer, MockTracer and JaegerTracer. The new results are located here.
2 - Simple Spring Boot application
Implementing a simple billing example, with services to create invoices, add line items, compute taxes, notify customers by email and issue invoices. The repository of invoices is kept in memory using a ConcurrentHashMap. This example also compares the same logic of these services but creating spans with tracers NoopTracer and JaegerTracer.
This is a result of an execution these tests with 1 thread and this with 5 threads.
The tests were executed in a personal notebook with these characteristics:
Model Name: MacBook Pro
Processor Name: Intel Core i5
Processor Speed: 2.6 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 3 MB
Memory: 8 GB
Any feedback is welcome! :)
@gsoria Thanks for the details.
Personally I think the results from the spring boot app are more useful, as the overhead of tracing is being taken in the context of communications between services.
From the results 21-31-17
, it looks like Jaeger is adding about 30% overhead - although it is interesting that using the NoopTracer is slightly better performance than the non-instrumented version.
What configuration was used for the JaegerTracer? Is it reporting spans via UDP to the agent, or Http to the collector directly?
Hi @objectiser thanks for your review! :)
The configuration used for Jaeger was reporting spans via UDP as you can see in the code.
To be sure about the % of overhead I modified the Billing example, deleting the persistence of invoices in RAM and I re-run the tests.
This are the results with 1-thread and 5-threads.
I think the results where NoopTracer has better performance than Non-instrumented it's because the numbers were jeopardized by GC.
Hi @gsoria
Sorry hadn't looked at the actual code :) - noticed that currently the services are all beans so running in the same vm.
Do you also have plans to extend the benchmarks to test performance when those services are communicating via http (i.e. as REST services)? As I think this would be good from a comparison perspective, to see how much overhead OT adds for communicating services.