Publish the baseline results for each language pair?

Question

Publish the baseline results for each language pair?

ftyers opened this issue 4 years ago · 4 comments

It would be useful in terms of sanity checking the results we get. E.g. If we have the baseline set up correctly.

It might also be good to publish the random seed for this purpose too.

Answer 1 · 2021-01-09T15:27:25.000Z

I'll update this issue as we get them:

Pair	Epochs	Converged?	chrF2	BLEU
Spanish→Aymara	20	No	0.176	0.96
Spanish→Aymara	30	Yes	0.211	1.94
Spanish→Bribri	30	?	0.239	8.85
Spanish→Nahuatl	30	?	0.276	5.21
Spanish→Hñähñu	30	?	0.228	4.17
Spanish→Quechua	30	Yes	0.343	12.60
Spanish→Shipibo-Konibo	30	Yes	0.174	0.38
Spanish→Raramuri	30	?	0.242	5.32
Spanish→Wixarika	30	Yes	0.296	14.33

Using first 200 sents from the training sets as dev and second 200 as test and the remainder as train.

Answer 2 · 2021-01-11T05:06:13.000Z

Tank you a lot. This is a good idea. Next week we will publish the values of the baseline. I have a question regarding Aymara. Why do you have two experimentes with Aymara?

Answer 3 · 2021-01-11T05:10:10.000Z

No problem! :) As for the two values, we did one with 20 epochs, but noticed that the loss on the dev set didn't converge, so we tried with 30 and it seemed to converge.

Answer 4 · 2021-03-13T15:06:45.000Z

Sorry, Just saw that this issue was still open. The baseline for all languages is online. :) Thanks a lot!