/NeuScraper

This is the code repo for our paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".

Primary LanguagePythonMIT LicenseMIT

Watchers

No one’s watching this repository yet.