Pinned Repositories
corpus
corpus issues.
documentation
download_oscar
Downloading all files of a language from the OSCAR (Open Super-large Crawled Aggregated coRpus)
goclassy
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
oscar-blocklists
A compilation of multilingual URL blocklist
OSCAR-CommonCrawl-Collab
oscar-statistics
Compute statistics for OSCAR Monthly releases
oscar-tools
The original tooling for the OSCAR corpus rewritten in Rust
oscar-website
The website of the Oscar Project
ungoliant
:spider: The pipeline for the OSCAR corpus
OSCAR's Repositories
oscar-project/ungoliant
:spider: The pipeline for the OSCAR corpus
oscar-project/goclassy
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
oscar-project/oscar-website
The website of the Oscar Project
oscar-project/download_oscar
Downloading all files of a language from the OSCAR (Open Super-large Crawled Aggregated coRpus)
oscar-project/corpus
corpus issues.
oscar-project/documentation
oscar-project/oscar-tools
The original tooling for the OSCAR corpus rewritten in Rust
oscar-project/oscar-blocklists
A compilation of multilingual URL blocklist
oscar-project/OSCAR-CommonCrawl-Collab
oscar-project/oscar-statistics
Compute statistics for OSCAR Monthly releases
oscar-project/data-hub
Collab around OSCAR: Data soucing..
oscar-project/oscar-tools-go
A tooling for the OSCAR corpus
oscar-project/ut1-rs
ut1-blocklist rust library
oscar-project/oscar-io
Readers/Writers for OSCAR Corpus
oscar-project/.github
oscar-project/discussions