glotlid
There are 5 repositories under glotlid topic.
cisnlp/GlotLID
💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023
cisnlp/GlotCC
🕸 GlotCC Dataset and Pipline -- NeurIPS 2024
cisnlp/ungoliant
:spider: The pipeline for the OSCAR/GlotCC corpus
cisnlp/oscar-io
Readers/Writers for GlotCC/OSCAR corpus
cisnlp/oscar-tools
The original tooling for the GlotCC/OSCAR corpus rewritten in Rust