/Clean-and-Segmentation-of-Arabic-Text

The scripts include: Text segmentation into sentences, remove numbers, diacritics, replacated carracters, non arabic words, ponctuatons, etc

Primary LanguagePython

Input : Input.txt (the encoding must be UTF 8 ) Output: filename.txt with "SEG_" prefix on the name (ex: filename.txt ---> SEG_filename.txt)

Note: * The output file location will be at the script's same path

   **  The is intended for Python 3

   *** The script.py must be in the same folder with Seg_clean.py