/data-preparation

Code used for sourcing and cleaning the BigScience ROOTS corpus

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0