Natural Language Processing for IMDB reviews and 20Newsgroup news with multiple ML models
- Remove all non-words
- Transform the review in lower case
- Remove all stop words
- Perform stemming/lemmating
- Check and correct spelling
Use Sklearn package with hyper-parameters tuning (Pipeline, GridSearch)
- RandomForest
- Adaboost
- SVM
- Decision Tree
- Logistic
- KNN
- Naive Bayes
Data used:
- IMDB review
- 20 news group with removal of 'headers', 'footers', 'quotes'