Each dataset from the Open Speech and Language Resources Dataset.
Each dataset is in the files.list and the md5sum.txt which was downloaded from www.openslr.org/12/.
Some of these are BIG. Take care when deciding to build or download them!
The Dockerfiles are automatically generated from the Makefile.
CC-BY-4.0: https://creativecommons.org/licenses/by/4.0/