/moshakkel

basic Arabic diacritizer

Primary LanguageMakefile

----------------------------------------------
Moshakkel: Basic Arabic Diacritics Restoration
Ahmad Rajeh 5/8/2017
Project for Unix Tools, Dr. Kevin Scannell
*Core algorithm based on Zayyan et al. 2016
----------------------------------------------

Comments:
	Currently, the biggest problem with this software is that it's
	slow. Probably due to multiple harddisk r/w operations and
	repeatedly grepping multimillion-line files. 

	The correctly-dicritzed words percentage reported by test.sh is
	likely an underestimate. After manual inspection, I rarely find
	undicritized/misdicritized words. In fact, some of the words 
	that had no or partial dicritization were correctly dicritized!
	But those cases will be considered mistakes by test.sh (which
	is essentially diff) because they don't exactly match the original
	dicritization. So the error rate is inflated.
	
	Tested using the Tashkeela corpus as training data.