/NLU_final_project

Final project for the course Natural Language Understanding @ UNITN. Final grade 30/30 cum laude

Primary LanguageJupyter Notebook

On the evaluation of BERT-based models for Italian Sentiment Analysis

This is the GitHub repo of the final project for the course NaturalLanguageUnderstanding @ UNITN 2020-2021.

Task

The goal of this project was to test 2 BERT-based italian models on multiple italian datasets from different domain, in order to compare their performances and to assess their generalization capabilities, with and without finetuning.

Models

Data

To run the code, make sure to have all the datasets in data/:

Code

  • AlBERTo reproduction.ipynb: code to reproduce the results presented in the AlBERTo's paper, with a utility to run multiple experiments with different random seeds, in order to collect evidence and compute statistics about convergence
  • AlBERTo enhancements.ipynb: code to explore some minor architectural and hyper-params changes, in order to improve the AlBERTo's performances on polarity classification
  • AlBERTo multiclass.ipynb: code to adapt AlBERTo to multi class predictions, along with some hyper-parameter tuning and train/validation split
  • Error analysis.ipynb: code to train and test AlBERTo and Feel-IT on the various datasets, with and without fine-tuning
  • report.pdf: report
  • SOM.pdf: downloaded copy of the Support Online Material cited in the report
  • Project presentation.pdf: slides used for the presentation of the project with prof. Riccardi and Gabriel Roccabruna (25-09-2021)

Environment

The file environment.yml contains a Conda env to automatically install all the relevant packages

Report

A more detailed description of this project is available in report.pdf. Supplementary online material is available here.