killbill/killbill-commons

Investigate using a third party system to solve distributed synchronization in KB

Opened this issue · 1 comments

Production KB will always involve several instances, and today distributed synchronization is achieved through the use of a distributed lock from the SQL engine. This approach is great because it allows to deploy with no dependencies, but fundamentally databases are not the best tool to implement reliable distributed synchronization logic.

We would like to investigate the use of some existing OSS distributed synchronization system. Probably a good place to start would be zookeeper or potentially consul.

The work would be to

  1. Create new implementation of the GlobalLocker, and extend the configuration to make it one possible option.
  2. Create a new docker compose to start a KB stack with zookeeper
  3. Run default tests to verify behavior is as expected
  4. Extend tests to create conditions where we hit multiple requests for same accounts across different nodes.

I would focus first on implementing the distributed locking logic using the zookeeper api -- i.e implementing our interface GlobalLocker on top of zookeeper.

For this, i would suggest reusing the WriteLock contrib class that takes care of the heavy lifting. The overall idea is the following:

  1. Since zookeeper offers a distributed filesystem, we will use the following structure to store our locks: /kb/locks/accounts/<account_id>/. This is the path of the WriteLock class
  2. The WriteLock code basically creates EPHEMERAL (ensures that if client gets disconnected, lock is automatically released) and SEQUENTIAL (ensures ordering among different client competing for same lock) znode below the given path, and through a callback mechanism, notify caller to let them know when it is the lowest number -- and therefore acquired the lock.

There is a good blog post explaining how to create a blocking lock on top of WriteLock.

I would think copying this code is all we need to implement our GlobalLocker lock.