huge number of threads spin up for lot's of messages processing.
Closed this issue · 11 comments
Hi,
I'm using pysher which works for me -so thanks !- but with some caveats. I'm connected to a pusher channel sending 10's or 100's of message per second, each a few B to 0.5MB in size (bigger rarer following a log rule).
I observe fairly high cpu usage on the process using the library, and that has caused me trouble on low end machines (Atom N2800 @ 1.8Ghz). higher end machines (i3) take up the load without problems.
So looking at cpu usage and profiling a bit, I find 2 things fairly odd. One, it seems 2 threads are spun off for each message call back call and die thereafter. But the call back is not processed in the context of either of those 2 threads, it is processed in the context of the main websocket app thread, it seems.
ok, that's fair enough, why not.
I'm not sure why those 2 threads are spun off, and do what, then die. The other thing is that these threads are definitely created in the websocket module, as far as I can make out.
I'm hoping to reduce the cpu load that pysher uses to receive messages. Not sure if that will be achieved by changing the threading situation or something, but thought worth mentioning the thread behaviour, as I found it unusual (I quickly get into 10k 's of threads created and destroyed after a few minutes) and certainly it seems related to cpu consumption.
It's late for me, so I'm hanging off for today, I'm happy to work on solving this later, but thru this message hoping to get some kind of hint or architectural description that may help me start on it at least. (and who knows, maybe it'll get solved while I sleep :) )
Cheers.
Hi caliloo,
I addressed this issue today by using a scheduler so the timer threads are no longer needed. This greatly reduces the cpu load and increases the message processing speed. It is a very rough implementation and needs some refactoring and formal testing. But so far it seems to work.
However, I'm new to using git(hub), so I'm not sure how to share it with others (or anything else you're supposed to do on github). But you can find the updated file here:
https://github.com/berentn/Pysher/blob/master/pysher/connection.py
Hey @berentn ,
check the faq - I'd love to merge your changes!
https://help.github.com/articles/creating-a-pull-request/
Hi I've tested the proposed merge.
While I do not see huge numbers of threads anymore which is a good thing, I d'ont see a significant drop in CPU usage (I'm going from 85/95% to 75/85% , I was expecting a drop in excess of 50-80% to get to 45/15% after the change ...)
Cheers
@nlsdfnbch
Thanks, I gave it a try :)
@caliloo
I also get an absolute reduction of about 10% on my CPU. However, that is a reduction of ~40% in my case. From my profiler I see that stopping the timers is now ~4 times faster and starting the timer is ~40 times faster.
Maybe your callback functions are using a lot of resources? Did you check what happens to your CPU load when you use a simple callback function which discards the messages?
Yep have tried the function that discards messages, it has next to no effect on what I observe, maybe a drop by 1-2% of the numbers I see. As if all the load was generated by pysher rather than my callback.
In my normal case, I have actually moved my processing (which currently resumes itself to disk IO and not much else) to another process, the callback only packs stuff into a string and posts it to a queue, which is little load afaik.
What profiler do you use to figure which thread does what ? I'm willing to give it a try.
I constructed a test / benchmark file to compare the the timeout scheduling methods using cprofile and pstat. I'm not knowledgeable enough to create usable profiles / stats when running the pusherclient itself.
ok, too bad, I was hoping to profile the pusherclient ...
ok, I think I finally cracked this one. After some profiling, I figured that the webscoket module does dumb utf8 validation (byte by byte !!) unless you have wsaccel installed. So one pip intall wsccel later, it looks much better.
To resume my process was taking about 100-110% cpu (it was crashing regularly for lack of cycles and falling behind): the cpu contributors were :
- ssl about 10%
- dumb utf8 validation about 60% (vs wsaccel which is maybe 5-10%)
- crazy threads spin up 15% (vs sched 0%)
- the rest 10-15 %
so with ssl, wsaccel, sheduler instead of threads, my process shows about 25-30% cpu usage, that a 4x division of cpu usage which is nice :)
I'd definitely merge the timer shceduler patch if possible, it's much neater.
I've uploaded a new preview version (0.6.0b1) - Since tests are currently broken, it would be great if you guys could give that release a spin and see if anything breaks your current code.
Thanks,
I'm giving it a try right now, it has started and run properly for the last 15 minutes, I'll give it a few days, see how it behaves and report back (I don't expect big stuff to show up, as most of the connection.py hasn't changed from berentn proposal, but pusher.py has seen some churn, so hey :) )
Could you add thread names to the various threads that are spun up ? I found that it helped with debugging.
connection.py:72:
self.name = "PysherEventLoop"
connection.py:197:
if not self.timeout_scheduler_thread:
self.timeout_scheduler_thread = Thread(target=self.timeout_scheduler.run, daemon=True, name="PysherScheduler")
self.timeout_scheduler_thread.start()
elif not self.timeout_scheduler_thread.is_alive():
self.timeout_scheduler_thread = Thread(target=self.timeout_scheduler.run, daemon=True, name="PysherScheduler")
self.timeout_scheduler_thread.start()
(I may be wrong, but there may be one other thread flying around, havn't looked into it further, as this served my purpose)
Have checked my logs for the past 3 days. Couldn't see anything that was a problem. This is good from my pov.