/roots_data_download

Sample data from the roots corpus

Primary LanguageJupyter Notebook

Roots Data Download

Sample data from the roots corpus

[1] Go to the dataset section of https://huggingface.co/bigscience-data

[2] Open each of the data split and accept BigScience Ethical Charter. Otherwise you won't be able to download data.

[3] Open the notebook. Load and Sample data according to your wish. Please note that the notebook supports sampling from multinomial distribution.