LaReQA (XQUAD-R) results reproducability for mBERT
Doragd opened this issue · 5 comments
@sebastianruder @nconstant-google
Sorry to bother you. There are some details I know not clear. Could you answer the following questions?
========================================================
I found the training and evaluating scripts for LaReQA on
https://github.com/google-research/xtreme/blob/master/scripts/train_lareqa.sh
https://github.com/google-research/xtreme/blob/master/scripts/run_eval_lareqa.sh
Is the scripts and the source code associated with this paper?
Is there any difference compared to the implementations mentioned in the original paper?
=========================================================
For XQUAD-R, I would like to know the K value of mAP@K?
Is equal to the correct candidates, i.e. 11 relevant answers ==> mAP@11?
However, the metric of the source code in this repo is set to 20, i.e. mAP@20.
Actually, I would like to reproduce the En-EN results of mBERT (mAP=0.29).
I do not know the original K value in the paper.
=========================================================
Thanks a lot ~
The LAReQA scripts in this repo can be used to reproduce the XTREME-R paper experiments.
The original LAReQA paper uses "full" mAP, with no cutoff K. To match that setting, you can remove k=20 on this line.
For baseline X-X, I noticed the mention in the original paper :
Our second baseline “X-X” extends the same SQuAD train set by translating each example into the 11 XQuAD languages using an in-house translation system.
Is the translation data of SQuAD train set equal to https://console.cloud.google.com/storage/browser/xtreme_translations ?
The translations are not identical, but they should be similar quality.
We don't plan to run the translate-train experiment for LaReQA. Feel free to use the numbers from the LaReQA paper.