ezmobius/nanite

Client-agent requests gets multiplicated

nofxx opened this issue · 10 comments

Supposing cli-mapper that sends a push to a client-agent:

push('/client_agent/foo', "hi")

And this client-agent sends a push (or request) to a main-agent:

push('/main_agent/foo', ["bar"] )

Main agent receives 2 requests instead of one. If I have for instance, 8 thins running, main agent receives 10 exaclty equal requests.
Looks like the requests are getting multiplicated by each mapper online on that time.

I'm guessing you're not using Redis, correct? If you don't all mappers will get the agent's request, and therefore all of them will forward it to an agent of their own. When using Redis only one mapper will get the request and forward it. It's probably a matter of discussion if this is a bug or a feature. I guess it wouldn't hurt to enable exclusiveness on the request queue when not using Redis as well, but there may be situations where not all mappers know the appropriate agents yet to handle that request. That again wouldn't hurt too much when using an offline queue, since that'd eventually lead to the right agent getting the message.

Hi Matt, thanks for the quick reply.
Yup, I'm not using redis. Hm, lots of options:
Use Redis, because this will really be a bug here this, heh...
Try to play with a tokyo adapter (already got it running on the server), redis is key/value store, right?
Or, if it's not too much to ask, could you give me some directions on how to disable this "feature" ? and rely on offline queue.

Redis is key/value, that's correct. I don't have a solution for it per se. A quickfix would be to look into cluster.rb. In setup_request_queue it's using an exclusive queue when using Redis. You could try always using that, i.e. remove the shared_state? check. The offline queue is a simple command line switch (--offline-failsafe) for the nanite-agent and a parameter for the mapper class (:offline_failsafe => true). It's just an additional sanity check which you should do anyway when you're relying on your messages being delivered.

The worst case scenario should be avoidable when using the offline queue and only having one mapper pick up the message from the request queue. Let me know if that works. Maybe it's worth considering to get that fix in.

Ops, now I realize I was pvt messaging matt. Sorry man.

Same issue with redis enabled. =/

UPDATE: I've tryed editing setup_request_queue in all ways I could imagine, same result.
Btw, got 2 specs failiing too, something about the ProxyMapper instance was not being erased...

That's a bit odd, even though we had to fix some issues with Redis and internal timeouts, that solved it for us, since then only one mapper gets the request and forwards it. I'd need some more log output from the mapper logs to get a better look at what's going on.

Gosh, I'm embarassed now, There was a mapper running I didn't see (w/o redis).
It's working fine with Redis. Really sorry, need to sleep, nanite is givin me some insomnia... (and it feels great).
Thank you Matt, I owe you a (or some) beer. Just let me know when you came to Brazil.

Just to confirm, removing the -#{identity} and the exclusive option of the amq.queue request, it works. Only one request and without Redis.
I'll be happy to work in a patch to make this an "option", if anyone is interested, or no better solution came to light. In the while, will keep a fork to make it easy to install it on my servers.

Thanks matt, thanks all you nanite devs, you guys rock!

I've added an option to the mapper init, well, it works, will try in production this weekend.
http://github.com/nofxx/nanite/commit/7804058cf297088f063cf5d1d2695c8b15ab71a0

Gonna write/fix the specs soon.. heh, sorry about the emacs whitespace cleanup too.

Wow, finally, looks like it's working now! ;)
It's only calling it once, offline_failsafe ensure that some agent will find the request. All good.

Was having a weird problem with some actors that use ActiveRecord, they just stop advertising their methods. The problem was I didn't knew about single-threaded... working fine now.

I've added those stones I've found on my path to the wiki. Again, thanks!
Good to be on nanite hehe..

Just an update: Albeit working flawless for weeks, on the deploy something strange happens (sometimes), the rails mappers, if I'm not wrong:

heartbeat-19018d26cdb64d27e25c55d007e73ebb 8149
heartbeat-25941f2a7e262e05e05c2349f08ff468 8153
....

Heartbeats start to accumulate, until god restart RabbitMQ, than everything gets back to normal... heh weird.
But heartbeat is about to be gonne, right? heh