jcpeterson/openwebtext
Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.
PythonGPL-3.0
Stargazers
- ajohnclark
- amorgun
- andy-yangzTencent
- brunojmBrazil
- carloshpfSão Paulo - SP - Brazil
- cdharrisBerlin / Europe
- combizLondon, UK
- ddbourginBrooklyn, NY
- dxe4London UK
- farizikhwantriTokyo Institute of Technology
- fbparis
- fijimunkiiNYC
- fly51flyPRIS
- G-WangGoogle
- ghosthamletThe Rest Is Silence of Code
- GitHub30Osaka, Japan
- gkirilNEC Laboratories Europe
- hoagy-davis-digges
- hrbrmstrGreyNoise Intelligence
- IsinlorPlain Complex
- jjhenkel@sema4
- joluwatosin
- justinjm@google
- lopuhinZyte
- luiscosioMéxico
- MicrosheepNational Chiao Tung University
- moebg@ironclad
- mttkTechnion
- okuchaiev@nvidia
- paul-englishSalt Lake City, UT
- reefactorautofaq.ai
- skeeetBay Area
- sndEarth
- StackTraceYoColumbus, Ohio
- thunn
- tluyben