thammegowda/mtdata

Add `allenai/nllb` dataset

Closed this issue · 2 comments

@thammegowda

So the options are:

  1. Including HF datasets dependency (which is quite large)
  2. Reverse engineering the link to the dataset

Or also they provide an option to do it through git-lfs. There's a lib for that https://pypi.org/project/git-lfs/. What do you think about it?

I think (2) reverse engineering the links to mtdata would be preferred, that way we don't have to include all the dependencies of HF datasets.
If (2) is not feasible or too complicated, we shall consider (1).