29 July - 9 August 2024 Leuven, Belgium
Course: The Challenges of Processing South Asian Languages
Area: Language and Computation (LaCo)
Level: Foundational
Lecturer(s):
-
Kengatharaiyer Sarveswaran
-
Tafseer Ahmed
Abstract: This course provides a comprehensive exploration of the intricacies of South Asian languages, with a focus on the Indo-Aryan and Dravidian language families. The course outline is crafted to immerse learners in the distinctive linguistic characteristics of South Asian languages, primarily the languages of Indo-Aryan and Dravidian families, encompassing areas such as script, encoding, transliteration, normalization, rich morphology, and syntax. A significant portion of the course is dedicated to examining how these aspects are applied within the scope of natural language processing (NLP). Designed to equip learners with a thorough understanding, the course aims to highlight both the challenges and the possible solutions pertinent to language processing. Therefore, the course not only offers an introduction to languages in South Asia, home to one-fourth of the global population and with a diaspora of approximately 40 million individuals but also introduces concepts that are generally applicable to multilingual NLP and low-resource languages.
University of Jaffna, Sri Lanka / University of Konstanz, Germany.
sarves.github.io / sarves@univ.jfn.ac.lk
Alexa Translations, Canada.