stellar/go

Horizon reaper can run on multiple horizon nodes at the same time

Closed this issue · 4 comments

SDF uses a horizon architecture where there are multiple ingesting nodes and multiple request serving nodes. If reaping is enabled on multiple nodes, there is nothing preventing two horizon nodes from reaping at the same time. Redundant reaping processes are wasteful because we end up repeating the same delete statements which result in unnecessary extra load on the horizon DB.

As a workaround we can enable reaping on only 1 Horizon node. However, the ideal solution would be to allow multiple Horizon nodes to coordinate so that only 1 node can reap at a time. We already do something similar for ingestion (only 1 horizon node can ingest at time).

Hey Tamir, do you mind explaining how we support multiple ingestion nodes? Whats the point of this too, if only one node can ingest at a time?

@JakeUrban the point of having multiple ingesting nodes is for redundancy in case one of the ingesting nodes fails / crashes.

The way the coordination is implemented is that, when ingesting a new ledger, all the ingesting nodes race to acquire a lock in the postgres db. Only one of the ingesting nodes is able to acquire the lock and that node is responsible for ingesting the new ledger. Once the ledger is ingested, the lock is released and the other nodes who attempted to acquire the lock realize the ledger has already been ingested and so they release the lock as well and wait until the next ledger is emitted by the network.

the other nodes who attempted to acquire the lock realize the ledger has already been ingested and so they release the lock as well

Got it, how do nodes realize the ledger was ingested? Do they make a DB query to confirm?

@JakeUrban yeah, they query the latest ledger in the db and if it's greater than or equal to the ledger they're about to ingest that means the ledger has already been ingested.