Project for the Deep Learning course - University of Bologna, Cesena Campus
Authors:
- Veronika Folin
- Anna Vitali
- Elena Yan
The aim of this project is to evaluate the performance of several deep learning models on the Grammatical Error Correction (GEC) task, which consists to transform a potentially wrong input sentence to a corrected version.
There are different types of errors within the input text, such as spelling, punctuation, grammatical, or words.
The dataset used for experiments is C4 200M Synthetic Dataset, a collection of 185M sentence pairs with grammatical errors generated from C4 (Colossal Clean Crawled Corpus).
We tested the following types of model:
- Encoder-Decoder with Recurrent Neural Networks
- Transformer
- Adam
- RMSprop
- categorical_crossentropy
- sparse_categorical_crossentropy
We used the following evaluation metrics to assess model performance:
You can find the work done in the final notebook, which is the notebook.ipynb
file.
A summary of experiments' configuration and results are in the experiment_results.xlsx
file.