ReStoreCpp/ReStore

Restore replication after a failure

Opened this issue · 0 comments

Right now, when too many ranks have failed, restore might lose some blocks permanently. In this case, an exception is thrown and the user has to obtain the data in some other way (usually by reloading it from disk). This could be improved by restoring the specified replication level after a failure.

The blocks stored on a failed rank should be assigned to new (living) ranks. For better performance, this should be done in a way that only reassigns blocks that were stored on dead ranks and leaves all blocks stored on living ranks untouched. A possible approach could be consistent hashing or similar techniques.