thammegowda/mtdata
A tool that locates, downloads, and extracts machine translation corpora
PythonApache-2.0
Issues
- 0
Depend on external lib for language standardization
#159 opened by AlexUmnov - 1
Allow strict langpair ordering
#157 opened by erip - 0
Update Tatoeba corpus
#153 opened by jeanm - 0
Add TALPCo
#152 opened by kpu - 0
Add Thai-English parallel corpus "scb-mt-en-th-2020"
#151 opened by kpu - 2
- 4
How to add in missing parts of datasets
#148 opened by arvieFrydenlund - 1
Add `echo` task
#139 opened by thammegowda - 1
Add support for monolingual data
#140 opened by thammegowda - 1
Add NTREX-128
#144 opened by thammegowda - 2
Index store bibkey and not the bibtext content
#147 opened by thammegowda - 3
Add Samanantar datasets.
#142 opened by BrightXiaoHan - 1
Travis build is broken
#136 opened by thammegowda - 0
Faster downloads with multiple streams
#141 opened by thammegowda - 3
JW300 taken down from OPUS
#77 opened by kpu - 2
Add `allenai/nllb` dataset
#133 opened by ZenBel - 1
Return non-zero on error
#122 opened by kpu - 2
- 1
AI4Bharath link is down
#119 opened by thammegowda - 1
Add EU acts in Ukrainian
#121 opened by thammegowda - 0
Add MaCoCu corpora
#128 opened by ZJaume - 0
Add ebible corpus
#131 opened by joelthe1 - 5
- 0
Add mni-eng parallel data
#127 opened by kpu - 0
Add gn-es parallel data
#126 opened by kpu - 2
Parallel Corpora for 6 Indian Languages
#107 opened by kpu - 2
ELRC-portal_oficial_turismo_españa_www.spain.info-1-eng-por doesn't contain eng-por
#101 opened by XapaJIaMnu - 5
Cannot Download wmt21 en2zh test data
#116 opened by Pzzzzz5142 - 0
Add ParaCrawl Ukranian bonus
#113 opened by kpu - 5
Trying to use mtdata with python
#114 opened by MathieuGrosso - 0
Add visualizations in search results
#104 opened by thammegowda - 2
ELRC-euipo_law-1-eng-fra hits 403 (forbidden)
#97 opened by XapaJIaMnu - 1
- 1
Anuvaad-zee-30042021-eng-ben ERROR:: Unable to add Anuvaad-zee-30042021-eng-ben: en-bn/*.en matched []; expected one file
#108 opened by XapaJIaMnu - 0
- 2
Policy on BCP-47 in TMX files?
#98 opened by kpu - 1
- 4
Add datasets listed by Stanford NMT
#84 opened by thammegowda - 0
mtdata list : filters
#91 opened by thammegowda - 0
Add wmt21 tests
#90 opened by thammegowda - 0
Add wikititles v3
#86 opened by thammegowda - 0
Add wmt21 ha-en corpus
#87 opened by thammegowda - 0
Add wmt21 ccaligned datasets
#89 opened by thammegowda - 0
Add ParIce dataset (en-is)
#88 opened by thammegowda - 1
mtdata get ignores the language pair when the dataset has only one language pair
#94 opened by XapaJIaMnu - 0
- 1
Add parallel bible corpus
#80 opened by thammegowda - 1
Anuvaad Parallel Corpus for Indian languages
#81 opened by GokulNC - 0
recipes.yml is not packed in pip package
#82 opened by thammegowda - 0