cfoster0/humongous-rs
A Rust pipeline for extracting HUMONGOUS, a dataset of web-based text extracted from Common Crawl and ready for multilingual language modeling.
Rust
A Rust pipeline for extracting HUMONGOUS, a dataset of web-based text extracted from Common Crawl and ready for multilingual language modeling.
Rust