OSCAR-CommonCrawl-Collab

This repository contains notes, documents and reports about the collaboration between CommonCrawl and OSCAR regarding a new format of extracted text from CommonCrawl WARC files.