castorini/duobert

Do you have any code about how the checkpoint was pretrained?

Closed this issue · 2 comments

As mentioned in README, you have a "bert-large-msmarco-pretrained_only.zip", do you have any reference about how was it pre-trained?

We used the same pretraining code from the original BERT checkpoint:
https://github.com/google-research/bert/blob/master/create_pretraining_data.py

A description of what we did is in section 4.1 (TCP) of https://arxiv.org/pdf/1910.14424.pdf

... we further pretrain the model on the MS MARCO corpus (0.5B words) for 100k iterations with a maximum sequence length of 512 tokens, batches of size 128, and learning rate of 5 × 10−5. This second pretraining phase takes approximately 24 hours on a TPU v3.

Thanks for the quick reply, Rodrigo!