An Arabic NLP Project.
Given news from the Morrocan online news website, Hespress, what is the topic of that story?
- EDA
- Label Encoding of the topics
- Removing stopwords, punctuation, numbers, and make english letters lowercase
- Stemming using the Arabic Stemmer "ISRIStemmer"
- Splitting and shuffling the data (Last 20% of each topic is kept for the test set)
- Feature extraction using TFIDF
- Training and testing using the models:
- Support Vector Machine
- Multinomial NB
- Random Forest
- SGDClassifier
- Logistic Regression
- Comparing the models
- Support Vector Machine performed the best with 85%