dangle-reproduction: A Python repository from MohamedAghzal

Here is the code which accompanies our NLP Project:

The src directory includes the code which we used to produce the T5 semantic parser as well as code for translating the data.

We attempted to reproduce the process of dangle but we were unable to get it working - Dangle's code for running RoBERTa does not work, either.
The Baseline.py and Baseline_Multilingual.py implement our standard T5 semantic parser and a multlingual variant.
Below are some command lines to use this code:

Training the monolingual model:

python Baseline.py train --cuda --train_dir ../data/train.tsv --val_dir ../data/dev.tsv --test_dir ../data/test.tsv --T5_modelname t5-base --save_dir ../models/baseline/ --epochs 200 --batch_size 64 --lr 0.0002

Evaluating the monolingual model:

python Baseline.py evaluate --cuda --test_dir ../data/gen.tsv --checkpoint_dir ../models/baseline/ --batch_size 64

Training the multilingual model:

python Baseline_Multilingual.py train --cuda --train_dir ../data/train_translated_trimmed.tsv --val_dir ../data/dev_translated_trimmed.tsv --test_dir ../data/test_translated_trimmed.tsv --T5_modelname google/mt5-small --save_dir ../models/baseline-translated/ --epochs 200 --batch_size 16 --lr 0.00002

Evaluating the multilingual model:

python Baseline_Multilingual.py evaluate --cuda --test_dir ../data/gen_translated.tsv --checkpoint_dir ../models/baseline-translated/ --batch_size 32

Training Our T5 Dangle (Buggy/broken)

python Dangle.py train --cuda --train_dir ../data/train.tsv --val_dir ../data/dev.tsv --test_dir ../data/test.tsv --T5_modelname t5-base --save_dir ../models/dangle/ --epochs 200 --batch_size 64 --lr 0.0002

The data directory contains the data on which our parsers were trained and evaluated.

The translated data sets are very large because they contain the translations of each sample in every language.
The "trimmed" translated data sets only contain one language per sample and an even distribution of languages.

MohamedAghzal/dangle-reproduction