huggingface/OBELICS
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
PythonApache-2.0
Stargazers
- 152334HNational University of Singapore
- AkshayBhatAINew York City
- anas-awadallaSeattle, Washington
- buduiTencent
- ChenghaoMouDocusign
- DaehanKim
- donglixpMicrosoft Research
- evdcush
- FFengIllGalaxy
- fly51flyPRIS
- hellbellNaver AI Lab
- HugoLaurenconHugging Face
- i-gaoStanford University
- intfloatPeking University
- itsliupengBeijing
- jeromeku
- jgraham1989
- JingyeChenMicrosoft Research Asia
- kocoten1992
- limitty
- michalwolsNew York
- moskomuleRIKEN AIP
- Natyren
- OpenAndrusAndrusB
- SandalotsVolcanak
- sangkilpark-kidmam
- SaulLuHugging Face
- snoop2headKAIST AI
- soma2000-lang@unifyai
- stas00Stasosphere Online Inc. / Contextual.AI
- tkersey@thisisartium
- tuofeilunhifiLi Auto
- vishaal27University of Tübingen | University of Cambridge
- yangfawenKivisense
- yihaocsSalesforce Research
- ZubinGouTsinghua University