/openwebtext

Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Stargazers