Issues
- 0
Standard Moroccan Tamazight is mislabeled
#28 opened by MedAymenF - 0
Provide context for input and output
#29 opened by joiemoie - 0
- 1
- 2
Native language visualizations
#6 opened by vienneraphael - 4
Weird line length spikes in Serbian, Croatian, Bosnian (data analysis task)
#22 opened by gordicaleksa - 0
[Modeling] Release a 615M HBS (Croatian, Bosnian, Serbian) Open-NLLB checkpoint
#15 opened by gordicaleksa - 0
[Data] Acquire additional high-quality (non-public) parallel corpora for HBS
#19 opened by gordicaleksa - 0
[Future - outside current project scope] non-English LLMs (Serbian LLM, etc.)
#21 opened by gordicaleksa - 0
[Future - outside current project scope] 7B lang-family-specific Open-NLLB checkpoint
#20 opened by gordicaleksa - 0
- 0
- 0
- 0
Get a compute grant
#14 opened by gordicaleksa - 0
- 1
sub-batches creation
#4 opened by vienneraphael - 0
- 0
- 0
Obtain high quality Serbian parallel corpus (currently 0 support in our public bi-text)
#10 opened by gordicaleksa - 0
- 0
Hydra pickle issue in generate_multi.py
#3 opened by vienneraphael - 1
Spanish and Guarani filtering
#5 opened by vienneraphael - 0
Choosing the LID model
#8 opened by vienneraphael - 0
LID model peak probabilities
#7 opened by vienneraphael