qdrant/quaterion

Implement cross-batch memory for losses

monatis opened this issue 2 years ago · 2 comments

monatis commented 2 years ago

Paper: https://arxiv.org/pdf/1912.06798.pdf
Reference for implementation: https://github.com/msight-tech/research-xbm/

How it works

XBM relies on the observation that the drift of embeddings is slow during training, i.e., embeddings for the same object is changing in a very slow pace.
This lets us add embeddings and targets in a ring buffer of a certain size.
After a certain number of iterations, start using the buffer. Now the final loss is the weighted sum of the actual mini-batch loss and the ring buffer loss.

Suggested implementation

Introduce an XBMConfig class to hold the configuration values such as buffer_size, start_iteration, xbm_weight.
Add a configure_xbm() hook in TrainableModel and return None by default.
İf it returns an XBMConfig instance instead, create a XBMBuffer instance in the TrainableModel constructor.
Implement the XBM logic in _common_step if stage is training.

Notes

We cannot re-use the existing Accumulator classes because they are not ring buffers.
I don't think we need a mixin because addition to TrainableModel will be only a few lines of code, and we need to update _common_step anyway.

monatis commented 2 years ago

Suggested implementation is in the issue. WDYT? @generall and @joein

monatis commented 2 years ago

Completed in #175