/Amazon-Reviews

Final project for the Text Mining and Search exam. Creating models to predict overall rating of the review.

Primary LanguageJupyter Notebook

Wish Upon a Star

TOC

Overview

This repository hosts the code for the final project for the Text Mining and Search course. The task we aim to solve is a binary classification of Amazon product reviews from the Clothing, Shoes and Jewellery category into positive and negative using the text of the review as the input data.

How to Use

In order to explore our work, please refer to the Main Notebook.ipynb file. This notebook hold all the code, NLP processing and classification models used presented in a complete way.

This notebook is set up for execution on Google Colab. You may need or wish to modify a couple or lines of code:

  1. In the first line of code, the dataset is downloaded. If you already have it in another location, you can avoid running the cell. Be sure to correctly specify the file path (3rd cell).
  2. The results are by default saved in a Google Drive folder. You will need to change the results folder path in order to match you own Drive structure. Please note that the Results folder must exists before the notebook runs.

Other Repo Files

The Experiments directory holds the code that was used to decide which NLP pipeline and which classifiers to use. It was structured for reusability and modularity. The code in this folder was then refactored and included in the Main Notebook.ipynb with the goal of making the notebook as understandable and self-contained as possible.

The two utils and nlp modules act as libraries for the other main files.

Data

The data can be found at this link.

About Us

Davide Toniolo

  • Current Studies: Student at the Master Degree in Data Science, University of Milano Bicocca
  • Background: Bachelor Degree in Physics, University of Milano Bicocca

Lorenzo Camaione

  • Current Studies: Student at the Master Degree in Data Science, University of Milano Bicocca
  • Background: Bachelor Degree in Computer Science, University of L'Aquila

Acknowledgements

Thanks to @malborroni for the this awesome readme template.