xmppo/node-xmpp-bosh

Prevent XMPP connections to a subset of domains

dhruvbird opened this issue · 19 comments

Allow admin to specify blacklisted domains in config file.

That's a nice feature :)

Thanks a lot, Dhruv.

Welcome! Let me know if it doesn't work for you or you face any issues with it.

Yep! I'll revert to NXB in case Prosody/Metronome internal BOSH server is not good enough on resources and speed (ATM NXB looks much lighter and faster, good news for you!).

Re-installed NXB yesterday, running fine for now with only jappix.com and anonymous.jappix.com allowed. It seems not to crash anymore because of the TLS socket error issue, since NXB only use unsecured XMPP connections now.

(y)
It would be nice to get a handle on why it's crashing though since it might be useful to know and report it.
Let me know if you have some bandwidth to collect the gdb backtraces.

Yep, since I'm in production I will wait for NXB to become unstable again to debug ;)

BTW: NXB BOSHd + Metronome/Prosody XMPPd are great together for resources, here's what I get on my 512MB RAM powered VPS: https://stats.jappix.com/ :) (only half of the RAM used for a bunch of opened c2s, s2s and BOSH - NXB consumes less than Metronome/Prosody's built-in BOSH while on unsecured NXB-XMPPd connections).

It's strange that the internal BOSH consume more resources than an external one. Probably a lot of room for optimization. Again depends on how it's implemented. I hope that someone fixes that to use lesser memory.

Yes, but node is really designed for that purpose HTTP + real-time. Yes I've got a friend working on this, he is the author of Metronome, a Prosody fork.

See that: https://bind.jappix.com/ - I'm still under DDoS but everything is okay, but the number of sessions that are closed keeps growing. Maybe your counter has to be set after the route blacklist/whitelist check, uh? :)

Yep - the counter increments as soon as a session is established. This is required since the session establishment doesn't need creds, and is a BOSH artifact rather than an XMPP artifact. The counter that's wrong is the one for the streams. That needs fixing. Thanks for pointing it out!

Yep, by reading the code 2 hours ago that's what I understood ;)

As the author trying to troubleshoot and replicate the issue, after 2 weeks of testing I think I came out with an explanation I suppose.

I tried hammering Metronome with 600 bosh clients sending around 2000-3000 requests/s and scrambling with unclean session closes, to no avail considering I kept to only get a consistent small memory increase (on 64 bits even).

So today instead I tried replicating putting lighttpd in the middle, reverse proxying to Metronome's BOSH, and to my surprise lighttpd started chewing up several dozen megabytes after a few hours and with rather sudden RES increases.

What puzzles me is why that doesn't seem to happen with nxb.

@maranda Sorry, I didn't get this bit:

I tried hammering Metronome with 600 bosh clients sending around 2000-3000 requests/s and scrambling with > unclean session closes, to no avail considering I kept to only get a consistent small memory increase (on
64 bits even).

@dhruvbird as I said I have been trying to look why on unsecured connections Metronome's internal bosh should have used more then NXB, and couldn't replicate. And for scrambling with unclean session closes I meant that during the stress test hammering I had the drones uncleanly closing (not terminating) the bosh sessions and reconnecting to double the severity of the hammering :).

@maranda No idea - it could be because of the work that the node.js & the v8 team have done with their JIT/GC. Probably Lua handles things differently. Can you check if the XMPP connections on the upstream XMPP server are the same at peak or if they differ?

The problem is that it's not Lua or Metronome the memory usage looks still and consistent, it's lighttpd the one choking out and chewing memory in my tests (and that's even more evident now that I optimized the http server a little further).

The only difference I see is that I have Metronome correctly add "Connection: Keep-Alive" header in responses by default as per spec if it's a HTTP/1.1 request as long as not stated other wise by the requesting entity with a "Connection: close" headers' entry. While neither NXB (?) or lighttpd do that.

@maranda From http://en.wikipedia.org/wiki/HTTP_persistent_connection#HTTP_1.1

In HTTP 1.1, all connections are considered persistent unless declared otherwise.[1] The HTTP persistent connections do not use separate keepalive messages, they just allow multiple requests to use a single connection.

You could try tuning lighttpd's connection timeout and try again.

Yes, that's what I said, but neither NXB or lighttpd add a Connection header in responses (it's entirely optional as long as the token close isn't specified), I wonder if that's what causes lighttpd's proxy to misbehave with Metronome's internal bosh as opposed to NXB and if that's the case that'd be a bug in lighttpd, which I'm not very worried about :)

Either ways, it would be good to disable and test just to be sure.