Multi-genre Classification on Literary Books

Advanced Topics in Machine Learning

AUTH, Data and Web Science Msc program

This is the repo for the "Advanced Topics in Machine Learning" course

This project concerns Multi-Label genre classification from book descriptions using various Multi-Label Learning techniques such as:

  • Binary Relevance
  • Classifier Chains
  • Label Powersets
  • Deep Learning

Futhermore, the problem of class imbalance is adressed using different methods. Finally, we explore multiple Active Learning approaches to simulate a real world NLP problem where labeled data are often not available and their manual annotation is difficult and time consuming.

Dependencies

In order to reproduce this project it is highly recommended to create a new Python 3 virtual env and install packages from requirements.txt file

python3 -m venv advanced_ml_venv
source advanced_ml_venv/bin/activate
pip install -r requirements.txt