thefactory/marathon-python

[Profiling] Humongous CPU with event_stream

Closed this issue · 6 comments

I recently developed with the help of your client, which is quite awesome, a listener for marathon events using the event_stream() method.

The idea is simple, listen for the "deployment_step_success" event and execute some internal business code when this happens.

During the development/test phase I used a local marathon installation to provide the events to my code but when the code went live some problems started to happen.

The code seemed much slower than I expected and the CPU usage was ginormous.

After a quick profiling I discovered the problem was lying on the SSECLIENT lib that is used behind the scenes then I extracted the bit of code that actually listens for the events and performed another test.

from marathon import MarathonClient

marathon_client = MarathonClient(marathon_address)

for _event in marathon_client.event_stream():
    print("Got event : {}".format(_event.event_type))

OBS: The mesos/marathon cluster on production is around 60 apps with 200 tasks total

This simple code hangs about two minutes before outputs "Got event whatever" !

Htop output:
screenshot from 2016-08-22 12-02-14

Here's the cProfiling output:

17857859 function calls (16810521 primitive calls) in 425.454 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    143/1    0.007    0.000  425.454  425.454 {built-in method exec}
        1    0.001    0.001  425.454  425.454 events.py:1(<module>)
       22    0.001    0.000  425.326   19.333 client.py:696(event_stream)
       38    4.582    0.121  425.004   11.184 sseclient.py:58(__next__)
   522457    0.650    0.000  268.910    0.001 sseclient.py:52(_event_complete)
   522458    0.644    0.000  268.261    0.001 re.py:167(search)
   522464  266.073    0.001  266.073    0.001 {method 'search' of '_sre.SRE_Pattern' objects}
1567259/522420    0.770    0.000  151.506    0.000 {built-in method next}
   522420    0.696    0.000  151.198    0.000 utils.py:368(stream_decode_response_unicode)
   522420    0.279    0.000  149.246    0.000 models.py:672(generate)
   522420    0.975    0.000  148.967    0.000 response.py:336(stream)
   522420    3.639    0.000  146.838    0.000 response.py:276(read)
   522420    1.078    0.000  134.903    0.000 client.py:518(read)
   522425    0.731    0.000  133.973    0.000 {method 'read' of '_io._RawIOBase' objects}
   522420    0.656    0.000  133.094    0.000 client.py:549(readinto)
       85    0.000    0.000  132.453    1.558 socket.py:360(readinto)
       85  132.452    1.558  132.452    1.558 {method 'recv_into' of '_socket.socket' objects}
   522420    0.281    0.000  132.438    0.000 {method 'readinto' of '_io.BufferedReader' objects}
   522420    0.976    0.000    3.345    0.000 response.py:180(_init_decoder)

Besides "recv_into" which is too low level, the regex is executes half a million times which is responsible for the most time spent during the whole execution.

Any ideas of how to workaround this ?

I think the problem lies on this code:

sseclient.py:58

def _event_complete(self):
        return re.search(end_of_field, self.buf) is not None

    def __iter__(self):
        return self

    def __next__(self):
        while not self._event_complete():
            try:
                nextchar = next(self.resp_iterator)
                self.buf += nextchar
            except (StopIteration, requests.RequestException):
                time.sleep(self.retry / 1000.0)
                self._connect()

                # The SSE spec only supports resuming from a whole message, so
                # if we have half a message we should throw it out.
                head, sep, tail = self.buf.rpartition('\n')
                self.buf = head + sep
                continue

Half a million times eh? :(

Surely there is a more efficient way to see if EOF is in the buffer!

robj@robj-yelp-mb: % python -m timeit -n 10000 -s "import re; x='\r\n\r\n'; y='\r\r'; z='\n\n'; s='helloab'*100000" "x in s or y in s or z in s"
10000 loops, best of 3: 2.15 msec per loop
robj@robj-yelp-mb:% python -m timeit -n 10000 -s "import re; end_of_field = re.compile(r'\r\n\r\n|\r\r|\n\n'); s='helloab'*100000" "end_of_field.search(s)"
10000 loops, best of 3: 5.69 msec per loop

maybe I've missed something here, but seems using python's built in string matching would be faster? I took the regex from here

@solarkennedy Yep !
@Rob-Johnson Yep !

The caveat is that Marathon actually returns all tasks from all apps bundled into the event data so in my case where I got 200 tasks you can imagine the size of the buffer.

If we count 1 byte per character, 500k characters (or iterations of 1 char) would be...500kb of data ! OMFG !

In marathon's sse case, the last character will always be the EOF so the code from sseclient would have to iterate over the whole buffer trying to find it until it finally encounters it.

It would be even easier to read the whole buffer and use a simple (yet powerfull) string search with ".find()" or "in".

Ill try to post some benchmarks here, but I think it will be magnitudes faster than 1/2 million searches using regexp.

I solved this problem on #143 =]

@migueleliasweb I've already made a pull-request to sseclient which solves this problem:
https://bitbucket.org/btubbs/sseclient/pull-requests/9/response-readline/diff