NadaMontasser/Data-Preprocessing-on-Egyptian-dialect-text

Jupyter Notebook

Data-Preprocessing-on-Egyptian-dialect-text

we will go over the steps of Data Pre-Processing:

Data Cleaning: The cleaning function removes all noises to delivers smooth Arabic text without impacting its meaning or content, such as:
- Extra characters
- Emojis
- Non-Arabic characters
- URLs
- Any Punctuations

Data Normalization: While data cleaning was used, we normalized the Egyptian text by:
- Remove elongation which is repeated letters.
- Correct the text by checking the spelling of Arabic sentences.
- Remove Tashkeel of the characters.

Data Visulaization: Visualize the text data using Arabic word cloud

The used packages and libraries:

Regular expression https://docs.python.org/3/library/re.html
Python ar-Corrector https://pypi.org/project/ar-corrector/
Arabic word cloud https://amueller.github.io/word_cloud/auto_examples/arabic.html