bakwc/PySyncObj

Dynamic Membership race condition on port binding

MaximilianoFelice opened this issue · 0 comments

According to the Dynamic Membership Change Documentation right now you're bound to do the following steps for adding a new node:

  1. Call addNodeToCluster(...) with the new address:port to add.
  2. Ensure 1 finishes
  3. Build SyncObject for the second node.

For highly scalable systems, this could be an issue, since the new node's port is not effectively bound after step 3.

Suppose the following scenario:

  1. New process A finds somehow a free port to bind on.
  2. New process A informs a designed manager on the existing cluster both its address and the port found in 1.
  3. The clusters' designed manager adds the new node
  4. A different process B in the same machine as process A opens a service on that process. For example, the operating system chooses it as a random port when an HTTP server sends :0 as the port argument.

This scenario would trigger an inconsistent state on the application since according to the mentioned docs, you can't add a new node until the previous connected. This leaves new node additions to be impossible, and it's an issue that should be addressed somehow.

Have you found any pattern that could address this issue? Should it be the case, could this be added to the docs?