Diacritics in Arabic language are the signs that are found above or under Arabic letters. Their main aim is to provide phonetic aid to readers as well as allowing them to understand the Arabic text in its intended and correct context. The existence of a diacritical mark can entirely change the meaning of Arabic text. Existing Optical Character Recognition (OCR) systems face accuracy difficulties when trying to read Arabic letters with diacritics. This affects the quality of the digitized Arabic text.
tashkeelWAP is a web application with two games that allow the digitization of Arabic text by outsourcing it to native Arabic speaking players. As a bi-product of playing the games, we collect possible digitization of Arabic words with diacritics that were not recognized by OCR systems.
The project was implemented as part of my bachelor thesis project. It resulted in a paper presented during the 2nd International Conference on Arabic Computational Linguistics (ACLing 2016), 2016.