Mini-project 2 — Text Classification

The second mini-project of COMP 551 - Applied Machine Learning.

Useful links

Text classification is performed with logistic regression, decision trees, SVM, Ada boost, random forest, and multinominal naive Bayes.

Below are the files that cover the whole project. The other files are supplimentary and served as a playground for testing.

COMP_551_Mini_Project_2_20_newsgroups.ipynb — Text classification of the 20 newsgroups dataset.
P2-IMDB.py — Text classification of the IMDB reviews dataset.
writeup.pdf — project write-up.

Completeness (20 out of 20): Everything was implemented.
Correctness (37 out of 40): The understanding of the models is good.
Writing (23 out of 25): In general, the writing is good. The abstract and introduction are not what they should be. You don't show enough figures that support your claims and on which you can build to go into more details.
Originality (10 out of 15): Limited additional experiments in terms of extra models or ways to perform preprocessing.

Total: 90%

The repositary was private for the duration of developement and was made public on Apr 19, 2020.