Does the code supports for the entire end-to-end fine-tuning including the retriever ?

Question

Does the code supports for the entire end-to-end fine-tuning including the retriever ?

Opened this issue 3 years ago · 7 comments

The REALM paper highlights that for downstream tasks they kept the retriever frozen. What about a task like domain-specific open domain question answering? In that kind of a scenario can we train the entire REALM with this code.

if yes: we might able to compare results with RAG-end2end

https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag-end2end-retriever

Answer 1 · 2022-01-29T04:41:52.000Z

As you saw in the paper, the evidence blocks are frozen during fine-tuning, which means that index updates are not performed in this time. Therefore, if domain specific QA is the case, we would have to firstly pre-train REALM to get domain specific evidence blocks (retriever), then we can further fine-tine on a given dataset.

Answer 2 · 2022-01-29T04:46:00.000Z

So we have to pre train the REALM with masked word prediction task right?

On Sat, 29 Jan 2022 at 5:42 PM, Li-Huai (Allan) Lin < ***@***.***> wrote: As you saw in the paper, the evidence blocks are frozen during fine-tuning, which means that index updates are not performed in this time. Therefore, if domain specific QA is the case, we would have to firstly pre-train REALM to get domain specific evidence blocks, then we can further fine-tine on a given dataset. — Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA4FGQFWTPDML34RREGCQTUYNVZXANCNFSM5NCKNTHQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- [image: Augmented Human Lab] <http://www.ahlab.org/> [image: uni] <https://www.auckland.ac.nz/en/abi.html> Gayal Shamane Ph.D. Candidate Augmented Human Lab Auckland Bioengineering Institute | The University of Auckland

Answer 3 · 2022-01-29T06:39:17.000Z

Exactly, but the pre-training part has not been fully ported to PyTorch, especially asynchronous MIPS refreshes, and Inverse Cloze Task (ICT), which is used to warm-start retriever training. Thus, to pre-train REALM, we would have to utilize the original TF impl., and then can fine-tune it on PyTorch.

Answer 4 · 2022-01-30T22:38:22.000Z

Thanks a lot for your insight. Anyways this end-to-end fine-tuning will be very expensive.

Answer 5 · 2022-02-01T22:23:00.000Z

@qqaatw is it part of the roadmap to port the pre-training part to Pytorch?

Answer 6 · 2022-02-02T15:39:42.000Z

@qqaatw is it part of the roadmap to port the pre-training part to Pytorch?

It was part of the roadmap, but now I'm thinking whether this is worth to port.

You can see the configuration of their experiments:

Pre-training We pre-train for 200k steps on 64 Google Cloud TPUs, with a batch size of 512 and a learning rate
of 3e-5, using BERT’s default optimizer. The document embedding step for the MIPS index is parallelized over 16
TPUs. For each example, we retrieve and marginalize over 8 candidate documents, including the null document ∅

which leveraged an array of resources and is extremely expensive for normal users and researchers. I don't have such resources and a regular deep learning workstation will not be able to reproduce similar results like that of them I think.

Answer 7 · 2022-02-03T20:05:47.000Z

@qqaatw "It was part of the roadmap, but now I'm thinking whether this is worth port." Yeah, this seems a problem and I agree.