language | tags | datasets | |||
---|---|---|---|---|---|
es |
|
|
GPT-2 model pre-trained from scratch using the Spanish portion of OSCAR during the Flax x Hugging Face community event by @mariagrandury, @mrm8488, @pablogps, @daveni, @srisweet, @jdposa, @shpotes, and @jorgealro.
The model used for training is OpenAI's GPT-2, introduced in the paper "Language Models are Unsupervised Multitask Learners" by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever.
This model is available in the 🤗 Model Hub.
Spanish portion of OSCAR or Open Super-large Crawled ALMAnaCH coRpus, a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the goclassy architecture.
This corpus is available in the 🤗 Datasets library.