
Text Normalization with Python

Primary LanguagePython


Text Normalization with Python

To run the program you will have to:

1- Have Python and pip installed on your machine

2- Install NLTK, langdetect and arabic_reshaper using this command:
pip install NLTK langdetect arabic_reshaper
3- Let NLTK download its packages


1- NLTK (Natural Language Toolkit) will not work with out downloading its packages

2- Toggle between English Text and Arabc Text by changing "Doc1.txt" to "Doc2.txt" or the opposite In line 17 in the code

3- To sort the result from the uncomment the line 60 in the code by removing the hash mark ( # )

4- The Tokenization is not supported for the arabic language by NLTK only Stop Words

5- If the letters in Arabic text are broken go to 2-

Made by:

* Mahmoud Moahmed A
* Marwan Atef A

* محمود محمد عبد العزيز
	* قسم: معلوماتية حيوية
	* رقم جلوس: 103

* مروان عاطف عبد اللطيف
	* قسم: معلوماتية حيوية
	* رقم جلوس: 106