Improve ageoff iterator remove/add logic in DataStoreImpl
Closed this issue · 1 comments
Removing and re-adding the metrics and meta ageoff iterators when the Timely server starts is a way to ensure that the configuration in timely.yml is applied to Accumulo. However, when multiple Timely servers get started at the same time the Accumulo table operations can interfere with one another causing a distributed race condition.
The previous way of handling this was to ignore conflicts since each server wa removing and data not applying the iterator (into the same Accumulo). A problem happens when the add operations conflict and are ignored and the last operation was a remove. This leads to major compactions being queued up and the data not being removed.
We should retry the remove and add operations a sufficient number of times to ensure that the final state is having the ageoff iterators and settings applied.
Instead of having every server remove and add the iterators and settings, we will use the same LeaderLatch pattern that the Balancer uses to elect a lead server and that lead server will remove and add the iterator settings.