/diacritics_restoration

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Neural Models for Automatic Diacritics Restoration

This project contains code and data for training and evaluating neural models for diacritics restoration. This repository accompanies our paper "A Comparison of Neural Networks Architectures for Diacritics Restoration" (available at conference cite and here in repository) to AIST'2020.

Short Problem Statement

Will be placed here at the proper time.

References

Yarowsky, D.: Comparison of corpus-based techniques for restoring accents in Spanish and french text. In: Natural language processing using very large corpora 1999, pp. 99--120. Springer (1999).

Tufiş, D., Chiţu, A.: Automatic diacritics insertion in Romanian texts. In: Proceedings of COMPLEX 1999 International Conference on Computational Lexicography, Pecs, Hungary, pp. 185--194. (1999).

Ezeani, I.M., Hepple, M., Onyenwe, I.: Automatic Restoration of Diacritics for Igbo Language. In: Proceedings of International Conference on Text, Speech, and Dialogue, LNCS, vol 9924, pp. 198--205. Springer (2016). DOI: 10.1007/978-3-319-45510-5_23

Schlippe, T., Nguyen, T .L., Vogel, S. E.: Diacritization as a Machine Translation Problem and as a Sequence Labeling Problem. In: Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas, pp. 270--278. (2008).

Náplava, J., Straka, M., Straňák, P., Hajič, J.: Diacritics Restoration Using Neural Networks. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp. 1566--1573. (2018)

Alqahtani, S., Mishra, A., Diab, M.: Efficient Convolutional Neural Networks for Diacritic Restoration. In: Proceedings of the 9th International Joint Conference on Natural Language Processing, pp. 1442–1448. (2019)

Orife, I.: Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text. In: Proceedings of the INTERSPEECH 2018, pp. 2848--2852. (2018) DOI: 10.21437/Interspeech.2018-42

De Pauw, G., Wagacha, P. W., de Schryver, G.-M.: Automatic Diacritic Restoration for Resource-Scarce Languages. In: Proceedings of the Text, Speech and Dialogue, 10th International Conference, pp. 170--179 (2007) DOI: 10.1007/978-3-540-74628-7_24