Helsinki-NLP/Tatoeba-Challenge

Download data for specific language pair

bricksdont opened this issue · 1 comments

Dear Jörg and colleagues

I would like to download the training and validation data used for a specific language pair and setting (one that is already covered by a pre-trained Tatoeba model). Example model I'd like to download the data for:

https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/deu-eng

If possible, I'd like to avoid downloading the entire OPUS collection for all language pairs and settings.

Thanks for your help!

P.S. Will close this issue if I figure out how to do it

Found a way to do it:

Look here:

https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/Data.md

to identify a link to a TAR file, such as

https://object.pouta.csc.fi/Tatoeba-Challenge/deu-eng.tar

I am assuming that's the correct way :)