/Dialect-Graduation-Project

This is our graduation project for our bcs in computer science

Primary LanguageJupyter NotebookOtherNOASSERTION

Graduation-Project

This is our bachelor's project in computer science, and the goal is to classify arabic text into five types of dialects (GLF,EGY,IRQ,LEV,NOR).
You can find the research paper at research/Graduation_project.pdf.
Here is a link to test the model: https://arabic.hawzen.me/

Credits

Data

Dataset Source
SMADC Areej Alshutayri and Eric Atwell. Classifying arabic dialect text in the social media arabic dialect corpus (smadc). 01 2021.
AOC-dialectal-annotations Ryan Cotterell and Chris Callison-Burch. A multi-dialect, multigenre corpus of informal written Arabic. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 241–245, Reykjavik, Iceland, May 2014. European Language Resources Association (ELRA).
annotated_data Omar F. Zaidan and Chris Callison-Burch. The Arabic online commentary dataset: an annotated dataset of informal Arabic with high dialectal content. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 37–41, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
Dart Israa Alsarsour, Esraa Mohamed, Reem Suwaileh, and Tamer Elsayed. DART: A large dataset of dialectal Arabic tweets. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, May 2018. European Language Resources Association (ELRA).
extra_data Us

Superviser

Dr. Nasser A. AlSadhan

Researchers

Abdulrahman Al-Shawi
Musaad Al-Qubayl
Khaled Al-Bader
Abdullah Al-Suwailem
Mohand Al-Rasheed