we will go over the steps of Data Pre-Processing:
- Data Cleaning:
The cleaning function removes all noises to delivers smooth Arabic text without impacting its meaning or content, such as:
- Extra characters
- Emojis
- Non-Arabic characters
- URLs
- Any Punctuations
- Data Normalization: While data cleaning was used, we normalized the Egyptian text by:
- Remove elongation which is repeated letters.
- Correct the text by checking the spelling of Arabic sentences.
- Remove Tashkeel of the characters.
- Data Visulaization: Visualize the text data using Arabic word cloud
The used packages and libraries:
- Regular expression https://docs.python.org/3/library/re.html
- Python ar-Corrector https://pypi.org/project/ar-corrector/
- Arabic word cloud https://amueller.github.io/word_cloud/auto_examples/arabic.html