dashersw/cote

Wasteful communications between instances?

aikar opened this issue · 4 comments

aikar commented

Let me preface I have not started using cote yet - still in research phase before execution.

I opened this after looking into internals and I believe this is a source of @claustres's issue brought up in slack.
In a full mesh network of 100 instances I expect 200 messages per heartbeat interval (100 incoming, 100 outgoing)

Based on this:

startDiscovery() {

this.startDiscovery();

this.sock.on('bind', () => this.startDiscovery());

this.startDiscovery();

this.sock.sock.on('bind', () => this.startDiscovery());

Each instance starts their own heartbeat process, so factor in say 10 requesters and 10 responders you now go to 20k messages per interval per process (2 million for the entire network per interval)

This is really inefficient.

Ideally, there should only be a single Discovery instance managed by the cote instance, and each requester/responder is tracked by the cote manager and the messages relayed accordingly.

This appears it would be an API breaking change if you consider .discovery public API (which seems like it might not be considered public API based on #99), but I believe this is critical to solving wasteful message processing.

I think there are some design choices here. It seems that cote components are not lightweight elements regarding discovery so that it's better to only have some per process. It appears that using namespaces/keys does not help as it enters the game after discovery. As a consequence, when having a finer granularity it's seems to be a good idea to map a set of finer elements to a single requester/responder or publisher/subscriber in cote. Segmentation has thus to be managed after routing. Of course this also depends on the use case.

Multicast might help to address the network overhead but the problem is that it does not work in most cloud environments and you have to rely on Redis in this case. Using multiple redis instances will probably help as well but at the cost of a more complex system configuration.

Shared discovery would also be a great think.

Any feedback from @dashersw will be appreciated in order to confirm this analysis, maybe it is worth a short doc section ?

aikar commented

In the meantime, I have a pending PR to node-discover (upstream) that will reduce the cpu cost pain of all the additional packets:
wankdanker/node-discover#39

Sorry to come in late to the discussion. @claustres you have summed it up quite well — that's the current scenario. I believe centralizing discovery per node as @aikar suggested earlier might be a viable approach to reduce network chatter at the cost of increasing hello package sizes, and the PR submitted above, if we have actual numbers on performance gains, might help with CPU processing.

Also in the meantime I recommend checking out https://github.com/dashersw/kotelett, which, admittedly, lacks documentation and is a heavy WIP, but it removes the need for a lot of this fuss —

intelligent routing based on message types removes the need for keys and segregation and multiple responders / requesters, as they are created dynamically for you. This also results in at most two discovery mechanisms per process because there's exactly one requester and one responder in each process (this 2 discovery instances could further be improved based on a similar approach above).

The code also becomes much more clear. there's only const kote = require('kotelett') and kote.send and kote.on. No instances, no requesters, or anything. Message configuration and routing is dynamic, and it still works for load balancing as well.

Obviously, it was an experiment, so it currently doesn't have pub/sub, but that's also very easy to implement.

aikar commented

FYI this line and maybe more will be an issue with my cache pr: https://github.com/dashersw/cote/blob/master/src/components/responder.js#L27

as it mutates the object directly instead of using the .advertisement() method.