AmericasNLP 2021 Shared Task on Open Machine Translation

This is the official repository for the AmericasNLP 2021 Shared Task on Open Machine Translation. All scripts have been tested with Python 3.8.5, and requirements will be updated accordingly.

A example of data in the shared task's format can be found in pilot_data/, and evaluate.py is an example of the metrics and evaluations that will be used for submitted MT systems.

Data sources

If you use one or more of the datasets included in this repository, please do not forget to cite each of te original papers.

Nahuatl: Gutierrez-Vasques, X., Sierra, G., & Pompa, I. H. (2016). Axolotl: a Web Accessible Parallel Corpus for Spanish-Nahuatl. In LREC.
Hñähñu online corpus: https://tsunkua.elotl.mx/about/
Wixarika: Mager, M., Carrillo, D., & Meza, I. (2018). Probabilistic finite-state morphological segmenter for wixarika (huichol) language. Journal of Intelligent & Fuzzy Systems, 34(5), 3081-3087.
Guaraní: Chiruzzo, L., Amarilla, P., Ríos, A., & Lugo, G. G. (2020, May). Development of a Guarani-Spanish Parallel Corpus. In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 2629-2633).
Feldman, I., & Coto-Solano, R. (2020, December). Neural Machine Translation Models with Back-Translation for the Extremely Low-Resource Indigenous Language Bribri. In Proceedings of the 28th International Conference on Computational Linguistics (pp. 3965-3976).
Quechua: Agic, Ž., & Vulic, I. (2020). JW300: A wide-coverage parallel corpus for low-resource languages.. ACL 2019.
Aymara (GlobalVoices): Tiedemann, J. (2012, May). Parallel Data, Tools and Interfaces in OPUS. In LREC (Vol. 2012, pp. 2214-2218).
Shipibo-konibo: Galarreta, A. P., Melgar, A., & Oncevay-Marcos, A. (2017, September). Corpus Creation and Initial SMT Experiments between Spanish and Shipibo-konibo. In RANLP (pp. 238-244).

ftyers/americasnlp2021

AmericasNLP 2021 Shared Task on Open Machine Translation

Data sources