Does the code supports for the entire end-to-end fine-tuning including the retriever ?
Opened this issue · 7 comments
The REALM paper highlights that for downstream tasks they kept the retriever frozen. What about a task like domain-specific open domain question answering? In that kind of a scenario can we train the entire REALM with this code.
if yes: we might able to compare results with RAG-end2end
As you saw in the paper, the evidence blocks are frozen during fine-tuning, which means that index updates are not performed in this time. Therefore, if domain specific QA is the case, we would have to firstly pre-train REALM to get domain specific evidence blocks (retriever), then we can further fine-tine on a given dataset.
Exactly, but the pre-training part has not been fully ported to PyTorch, especially asynchronous MIPS refreshes, and Inverse Cloze Task (ICT), which is used to warm-start retriever training. Thus, to pre-train REALM, we would have to utilize the original TF impl., and then can fine-tune it on PyTorch.
Thanks a lot for your insight. Anyways this end-to-end fine-tuning will be very expensive.
@qqaatw is it part of the roadmap to port the pre-training part to Pytorch?
It was part of the roadmap, but now I'm thinking whether this is worth to port.
You can see the configuration of their experiments:
Pre-training We pre-train for 200k steps on 64 Google Cloud TPUs, with a batch size of 512 and a learning rate
of 3e-5, using BERT’s default optimizer. The document embedding step for the MIPS index is parallelized over 16
TPUs. For each example, we retrieve and marginalize over 8 candidate documents, including the null document ∅
which leveraged an array of resources and is extremely expensive for normal users and researchers. I don't have such resources and a regular deep learning workstation will not be able to reproduce similar results like that of them I think.