Netflix/hollow

Multiple Hash Reindex when multiple delta exist?

adwu73 opened this issue · 2 comments

Hi, there:

First, thanks for this great product! We are building an advanced version of trello, have very flexible filtering capabilities using Hollow. Now we have a customer who has 1000 people using this tool to manage their daily developement work.

However, we are faceing some slow HashIndex reindexing issue in the morning time, when every one is using this tool.

We have a card server that is producing hollow delta every seconds, and a view server get delta update via nfs and reindex Hash Index every second. we have 30 Hash Index under card consumer, Under the heavy load, we found that Hash Index reindexing may take several minutes even after we can see the data updated through explorer, but the new data won't be available after the reindexing finished.

It seems that Hash Key Reindex is triggered after every delta update.

image

image

image

image

We want to know whether this is done intentionally? We found that if we set the producer to produce snapshot for every delta, instead of 1 snapshot for 10 delta, this will reduce the reindexing time.

We also found that although we have 8c on our server, but even under heavy reindexing situations, only 4c is used, can we make the reindexing parallel to leverage the CPU power?

Regards!

Adam

We have a card server that is producing hollow delta every seconds, and a view server get delta update via nfs and reindex Hash Index every second.

Cool!

One update per second goes beyond our original intended usage, so I wouldn't be surprised if you're bumping up against some limitations in Hollow's design.

We built for use-cases with updates on the order of minutes. We do have teams within Netflix using it for updates every 30 seconds (and demand for supporting more frequent).

It seems that Hash Key Reindex is triggered after every delta update.
...
We want to know whether this is done intentionally?

Sort of.

When a consumer updates to a newer version it will do one of these:

  • apply 1 snapshot and 0 or more deltas
  • apply 1 or more deltas

WRT indexes, the original intent was to full re-index on snapshot and incremental update on delta.

HollowPrimaryKeyIndex has incremental update implemented. In 2019 we made it configurable (defaulting to false) due to an issue with occasional lockups that we suspect, but never proved, are in the incremental index update code. In effect, HollowPrimaryKeyIndex always does a full re-index unless you re-enable the incremental update.

HollowHashIndex never had incremental update implemented.

So...given that both indices will always do a full re-index, it would be optimal (for a given consumer refresh) to wait until all updates are applied and re-index at the end. The current HollowTypeStateListener doesn't allow this.

One way you can work around this:

  1. don't subscribe the indexes for updates directly. If using HollowHashIndex don't call listenForDeltaUpdates(); if using HashIndex or HashIndexSelect don't call consumer.addListener(index).
  2. implement your own HollowConsumer.RefreshListener. In the refreshSuccessful() callback create new hash index instances

Thanks for quick reponses!

We will try to modify our code, and we also want to try to do the reindex in parallel manner. If we succeeded, is it possible for us to contribute the code back?

If it is possible, is there any design suggestions we should follow?