huggingface/olm-datasets
Pipeline for pulling and processing online language model pretraining data from the web
PythonApache-2.0
Stargazers
- AdloyaLa Guerre Des Lits
- aflah02Indraprastha Institute of Information Technology Delhi
- AhmedIdr
- alabargaExperimental Serendipity
- blockchainchadC Industries
- Danialgharaee
- finlaymacklonCanada
- fly51flyPRIS
- frankfanslcFatpipe Networks
- Harleymckee
- hyunwoongko@kakao
- joeljangUniversity of Washington
- ju-resplandeFederal University of Goiás
- jubbon
- krylmLMC
- KushtrimVisokaNORA.ai
- malteosBerlin, Germany
- michalski-luc
- michalwolsNew York
- MM-IRUniversity of California, San Diego
- mmizutaniTokyo
- mryab
- nabarunbaruaAIML
- nawnoesSeoul, Korea
- ola13@huggingface
- philschmid@huggingface
- rahular@mcgill-nlp @google
- scottsuk0306@kaistAI
- Se-HunHanwHa Life
- snoop2headKAIST AI
- soma2000-lang@unifyai
- sooftware@tunib-ai
- SOUMAJYOTIAWS
- thevasudevgupta@Unbox-AI
- u-brixton
- upskyyReturnZero Inc.