facebookresearch/stopes

NLLB mined data?

gordicaleksa opened this issue · 1 comments

Hi!

Did you ever release the mined data behind the NLLB project? (As mentioned in the paper's section 5.4 that's roughly 1.1B sentence pairs)

Thank you!

Apologies missed the metadata readme that you already released.

I also found that AllenAI replicated the dataset and released it to HuggingFace: https://huggingface.co/datasets/allenai/nllb