Text Normalization with Python
To run the program you will have to:
1- Have Python and pip installed on your machine
2- Install NLTK, langdetect and arabic_reshaper using this command:
pip install NLTK langdetect arabic_reshaper
3- Let NLTK download its packages
PS:
1- NLTK (Natural Language Toolkit) will not work with out downloading its packages
2- Toggle between English Text and Arabc Text by changing "Doc1.txt" to "Doc2.txt" or the opposite In line 17 in the code
3- To sort the result from the uncomment the line 60 in the code by removing the hash mark ( # )
4- The Tokenization is not supported for the arabic language by NLTK only Stop Words
5- If the letters in Arabic text are broken go to 2-
Made by:
* Mahmoud Moahmed A
* Marwan Atef A
* محمود محمد عبد العزيز
* قسم: معلوماتية حيوية
* رقم جلوس: 103
* مروان عاطف عبد اللطيف
* قسم: معلوماتية حيوية
* رقم جلوس: 106