qqaatw/pytorch-realm-orqa

Does the code supports for the entire end-to-end fine-tuning including the retriever ?

Opened this issue · 7 comments

The REALM paper highlights that for downstream tasks they kept the retriever frozen. What about a task like domain-specific open domain question answering? In that kind of a scenario can we train the entire REALM with this code.

if yes: we might able to compare results with RAG-end2end

https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag-end2end-retriever

As you saw in the paper, the evidence blocks are frozen during fine-tuning, which means that index updates are not performed in this time. Therefore, if domain specific QA is the case, we would have to firstly pre-train REALM to get domain specific evidence blocks (retriever), then we can further fine-tine on a given dataset.

Exactly, but the pre-training part has not been fully ported to PyTorch, especially asynchronous MIPS refreshes, and Inverse Cloze Task (ICT), which is used to warm-start retriever training. Thus, to pre-train REALM, we would have to utilize the original TF impl., and then can fine-tune it on PyTorch.

Thanks a lot for your insight. Anyways this end-to-end fine-tuning will be very expensive.

@qqaatw is it part of the roadmap to port the pre-training part to Pytorch?

@qqaatw is it part of the roadmap to port the pre-training part to Pytorch?

It was part of the roadmap, but now I'm thinking whether this is worth to port.

You can see the configuration of their experiments:

Pre-training We pre-train for 200k steps on 64 Google Cloud TPUs, with a batch size of 512 and a learning rate
of 3e-5, using BERT’s default optimizer. The document embedding step for the MIPS index is parallelized over 16
TPUs. For each example, we retrieve and marginalize over 8 candidate documents, including the null document ∅

which leveraged an array of resources and is extremely expensive for normal users and researchers. I don't have such resources and a regular deep learning workstation will not be able to reproduce similar results like that of them I think.

@qqaatw "It was part of the roadmap, but now I'm thinking whether this is worth port." Yeah, this seems a problem and I agree.