LaReQA (XQUAD-R) results reproducability for mBERT

Question

LaReQA (XQUAD-R) results reproducability for mBERT

Doragd opened this issue 4 years ago · 5 comments

Sorry to bother you. There are some details I know not clear. Could you answer the following questions?

========================================================

I found the training and evaluating scripts for LaReQA on

https://github.com/google-research/xtreme/blob/master/scripts/train_lareqa.sh
https://github.com/google-research/xtreme/blob/master/scripts/run_eval_lareqa.sh

Is the scripts and the source code associated with this paper?
Is there any difference compared to the implementations mentioned in the original paper?

=========================================================

For XQUAD-R, I would like to know the K value of mAP@K?

Is equal to the correct candidates, i.e. 11 relevant answers ==> mAP@11?

However, the metric of the source code in this repo is set to 20, i.e. mAP@20.

Actually, I would like to reproduce the En-EN results of mBERT (mAP=0.29).

I do not know the original K value in the paper.

=========================================================

Thanks a lot ~

Answer 1 · 2021-07-07T03:50:15.000Z

The LAReQA scripts in this repo can be used to reproduce the XTREME-R paper experiments.

The original LAReQA paper uses "full" mAP, with no cutoff K. To match that setting, you can remove k=20 on this line.

Answer 2 · 2021-07-07T03:58:26.000Z

For baseline X-X, I noticed the mention in the original paper :

Our second baseline “X-X” extends the same SQuAD train set by translating each example into the 11 XQuAD languages using an in-house translation system.

Is the translation data of SQuAD train set equal to https://console.cloud.google.com/storage/browser/xtreme_translations ?

Answer 3 · 2021-07-07T04:26:51.000Z

The translations are not identical, but they should be similar quality.

Answer 4 · 2021-07-07T05:09:14.000Z

In the XTREME-R paper, the results of the zero-shot upper bound (translate-train / translate-test) of LaReQA are empty? Do you have any plans to experiment? Or I can directly refer to the translate-test result in the LaReQA paper ?

Answer 5 · 2021-08-26T04:21:27.000Z

We don't plan to run the translate-train experiment for LaReQA. Feel free to use the numbers from the LaReQA paper.