malteos/llm-datasets
A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
PythonApache-2.0
A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
PythonApache-2.0