jaegertracing/jaeger-client-python

Remove dependency on Tornado

yurishkuro opened this issue · 12 comments

The only part of Tornado that we really depend on is the implementation of the queue used to pass spans between the application and the background thread. I think it would be better to remove that dependency altogether and manually create a background thread, similar to how lightstep tracer does it. It looks like this was first proposed in #31 (comment), but never made it to an issue.

This would supersede #201, #187, #31, #191

Upon digging more for this issue related to 'dependency on tornado', I could see the impact on 5 core files in total as of now: ioloop_util.py, local_agent.py, reporter.py, sampler.py & throttler.py
This could be a big change.

gunicorn with eventlet worker class makes introducing this library very difficult. Shame I have a dependency on eventlet in my codebase. If any advice, please let me know

Upon digging more for this issue related to 'dependency on tornado', I could see the impact on 5 core files in total as of now: ioloop_util.py, local_agent.py, reporter.py, sampler.py & throttler.py
This could be a big change.

Is it possible to make the changes gradually?
Maybe it's a good idea to make several independent PRs removing Tornado usage in one file at a time (of course ensuring that nothing breaks). And in the end, clean up dependencies.
That could help to decrease efforts and simplify code review.

This can certainly be done piecemeal, e.g. start with refactoring of the reporting functionality, which is the most important one.

The only downside of implementing reporter, sampler, throttler completely separately is that they all have some background activity, and having each one start a thread for that is not a great solution. Although I could be wrong - are many threads "ok" in Python? If not, then we may need some Executor abstraction, similar to the existing io_loop, which can be shared amongst the components.

zrayn commented

A couple of questions:

It looks like Tornado uses a non-thread-safe queue to perform its work, do you want to keep this semantic or switch to a thread-safe implementation (that can leverage more of the standard library) and incur whatever overhead comes with that?

The python 2.7 compatibility pushes a Tornado replacement into a threaded implementation as opposed to an async/await implementation, is this the desired direction or would it be preferable to drop the python 2.7 compatibility (given there are less than 3 month of official support for python 2)?

That's an interesting thought. Yes, perhaps an async but single threaded implementation is the way to go. But I think this needs some benchmarking. If async has similar performance, then a parallel set of components can be created for py3.

@yurishkuro @martin-kieliszek
Hi everyone, I have a problem with Openstack Osprofiler and Jaeger client.
Because Openstack uses the eventlet library, it doesn't work with Tornado in Jaeger.
Does anyone have a solution to this problem? Thanks.

I've had a bit of a look at this, and it seems relatively easy to be able to move all the tornado specific logic into a single class. AFAICT there are basically two things that tornado does:

  1. Periodic calls to fetch JSON config from a service for the sampler and throttler.
  2. Actually reporting spans.

One thing I realised while doing that was its relatively easy to just insert your own Reporter, at which point if you're not using the remote sampler/throttler stuff then nothing will create the tornado threads, and so no tornado overhead.

I've actually written a quick demo in rust that implements the Reporter interface and uses a native thread (its just a thin wrapper around the auto generated thrift code), which in our production has reduced the overhead from 10% to ~1% CPU. In a simple benchmark of reporting spans in a loop it increases throughput ~2x. Using a native thread has the advantage that the loop reporting spans doesn't take a lock on the GIL and so lets Python on the main thread to run. (I wouldn't necessarily recommend using it, as it seems to occasionally throw exceptions or send malformed packets to the jaeger agent).

I think one plan of attack here would be to make it easier to add different Reporter/Sampler/Throttler types, e.g. by ensuring the passed in spans, tags, etc are always in the thrift generated object formats, rather than custom formats. That would be a fairly small amount of work, and would also allow you to ship a twisted backed Reporter as well (since thrift will generate twisted code too).

@erikjohnston reporter & sampler are already pluggable, the tracer takes them as arguments, and the tracer itself has no tornado dependency afair.

The Span type passed to the reporter is the native Span object, not the Thrift-generated type, even though some of the data elements in the Span are thrift-like (namely, tags are already converted to Thrift-compatible object because having an intermediate representation was detrimental to performance). There are pros and cons to making the API between tracer and reporter to be a Thrift object, pro is stability of the API, cons are (1) it makes other reporting formats like protobuf or JSON less efficient, (2) it will be a breaking change because InMemoryReporter that's often used for unit testing of instrumentation exposes the native Span for introspection.

If there is a guarantee on the API interface of Reporter and co then that is fab! I've not seen that mentioned anywhere (or even examples initialising the Tracer directly), though I may be missing something.

Probably because the recommended way to create tracer is via Config, but creating it manually requires sampler and reporter, so the capability is already there:

def __init__(
self, service_name, reporter, sampler, metrics=None,
metrics_factory=None,

I'd love to see this happen, since my project is not using tornado but is suffering from DeprecationWarnings as a result of it. If I am reading the warnings correctly, the combination of jaeger-client==4.3.0 and opentracing-instrumentation==3.3.1 with its tornado>=4.1,<6 dependency will cause my package to not work in Python 3.9.

Here's how to reproduce the deprecation warning:

requirements.txt

opentracing==2.3.0
opentracing-instrumentation==3.3.1
jaeger-client==4.3.0

repro.py

try:
    import jaeger_client.config
except DeprecationWarning as e:
    print(e)
else:
    raise AssertionError(
        "Deprecation warning not raised, are you sure you ran this with the '-Werror' flag? "
        "Expected command: python -Werror repro.py"
    )

Executing in the environment created from that requirements.txt file:

$ python --version
Python 3.7.7

$ python -Werror repro.py 
Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working