/MT-LowRes

Primary LanguageSmalltalk

MT on Low-Resource Languages

The repository is for the following research:

Investigating Phrase-Based and Neural-Based Machine Translation onLow-Resource Settings

We conduct an investigation of phrase-based and neural-based machine translation on low-resource language pairs, which contain small bilingual corpora. Experiments were conducted on several language pairs:

  1. Japanese-English

  2. English-Vietnamese

  3. Indonesian-Vietnamese

  4. Czech-Vietnamese

Dependencies

https://github.com/rsennrich/nematus

https://github.com/rsennrich/subword-nmt

https://github.com/moses-smt/mosesdecoder

References

[1] Rico Sennrich, Orhan Firat, Kyunghyun Cho, Alexandra Birch, Barry Haddow, Julian Hitschler, Marcin Junczys-Dowmunt, Samuel Läubli, Antonio Valerio Miceli Barone, Jozef Mokry and Maria Nadejde (2017): Nematus: a Toolkit for Neural Machine Translation. In Proceedings of the Demonstrations at the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain.

[2] Rico Sennrich, Barry Haddow, Alexandra Birch (2016): Edinburgh Neural Machine Translation Systems for WMT 16, Proc. of the First Conference on Machine Translation (WMT16). Berlin, Germany

[3] Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio (2015): Neural Machine Translation by Jointly Learning to Align and Translate, Proceedings of the International Conference on Learning Representations (ICLR).

[4] Rico Sennrich, Barry Haddow, Alexandra Birch (2016): Neural Machine Translation of Rare Words with Subword Units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). Berlin, Germany.

[5] Rico Sennrich, Barry Haddow, Alexandra Birch (2016): Improving Neural Machine Translation Models with Monolingual Data. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). Berlin, Germany.

[6] Cettolo, M.; Niehues, J.; St ̈uker, S.; Bentivogli, L.; Cattoni,R.; and Federico, M. 2015. The iwslt 2015 evaluation cam-paign.Proceedings of the International Workshop on SpokenLanguage Translation (IWSLT). https://sites.google.com/site/iwsltevaluation2015/mt-track, https://wit3.fbk.eu/mt.php?release=2015-01

[7] Neubig, G. 2011. The Kyoto free translation task. http://www.phontron.com/kftt.

[8] Thu, Y. K.; Pa, W. P.; Utiyama, M.; Finch, A.; and Sumita, E. 2016. Introducing the asian language treebank (alt). In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC), 1574–1578. http://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/

[9] VLSP Project: https://vlsp.hpda.vn/demo/?&lang=en

[10] Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 48–54. Association for Computational Linguistics.

[11] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of ACL, pages 177–180. Association for Computational Linguistics.