revdotcom/speech-datasets

Releases of the individual datasets?

Closed this issue · 2 comments

Hi, I feel like it would be great, if you could prepare also releases of the individual corpora (earnings21, earnings22).
People could then download only the part they care about instead of a zip of whole repo (or getting git of the whole repo).
Just for your consideration -- but I think you should do it :)

sorry for the late response. Those solutions could work, but I think the input barrier is quite large (reasonably new git with sparse checkout and git-lfs). Also, I added recipes to earnings21 and earnings22 to lhotse (https://github.com/lhotse-speech/lhotse) and I just feel downloading single zip file release would be great. But I don't know what are the github limits to these things, so it might just not be possible