This repo is the project to measure negative interference in a multilingual meta-learning setup for the task of dependency parsing.
We built upon the paper Meta-learning for fast cross-lingual adaptation in dependency parsing and On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment along the codebase of Udify.
Ideally setup a conda environment and install all the requirements. jobfiles folder consist of all the required .sh files to run on Lisa. Use lisaatcs.job to setup your environment.
Create the directory for the data in Negative-Interference-UD
:
mkdir -p data/ud-treebanks-v2.3
mkdir -p data/exp-mix
mkdir -p data/concat-exp-mix
Navigate back to the metalearning
directory (cd ..
) and download the data.
bash ./scripts/download_ud_data.sh
It seems that download_ud_data.sh
not only downloads the data but also creates a treebank for all languages.
Run a script that copies treebanks of all languages used in her paper (based on Table 7). You can run it in the root metalearning directory.
python scripts/make_expmix_folder.py
Afterward, you can just pass the name of the folder with all these treebanks to concatenate them. concat_treebanks.py
needs imports Udify's util.py
which imports stuff like torch, so we need to run concat_treebanks.py
in a batch script. For that, you can use concat_treebanks.sh
. Run it from the root directory of metalearning with the command:
sbatch concat_treebanks.sh
After concatenating treebanks of all relevant languages, create the vocabulary (around 15 minutes):
sbatch create_vocabs.sh
Refer to the config file 'config/ud/en/udify_bert_finetune_en_ewt.json' to change to proper vocabulary path as Udify copies the vocabulary in multiple places through the train and test process.
We use many pre-training languages. Example job files are present in 'jobfiles/' directory.
As an example to finetune on Hindi run hindipretrain.job
. Refer to the paper for parameters and do not forget to change the 'path' in the respective config file.
- Add pytorch and other libs to env if they weren't added before.
- Check your unique path to the pre-trained mBERT generated from pretraining. Check the 'logs/' folder for generated logs.
- Fine-tuning process creates a file
model.tar.gz
and other metadata including best.th.(Note: some of the branches might not have this updated, so ensure that the model.tar.gz is zipped in the same location' and rename theweights.th
intobest.th
withmv weights.th best.th
) - Modify
train_meta.sh
to use the correct --model_dir from your pretraining. Change the flags as desired. With default parameters, it takes around 20 hours. - As an example run
hindimetatrain.sh
for the hindi pre-trained model - The numpy array containing gradient similarities is located in
cos_matrices
. The checkpoint gradient similarities are saved everysave_every
parameter.
NOTE: It is not possible to run the full training with a GPU with less than 24GB of memory! So when using Lisa we need to use the RTX titan. The job file already uses this (gpu_titanrtx_shared_course). Even with GPU equipped with 24GB memory OOM errors might occur!
To do evaluation or Meta-testing we use the script metatest_all.py
. It will generate a folder like metavalidation_0.0001_1e-05_20_20_sgd_saved_models-XMAML_0.001_0.001_0.001_0.001_5_9999_1
with the scores in json files.
Run python metatest_all.py --validate True --lr_decoder 0.0001 --lr_bert 1e-04 --updates 20 --support_set_size 20 --optimizer sgd --seed 3 --episode 500 --model_dir saved_models/XMAML_0.0005_5e-05_0.0005_5e-05_20_9999
where the path for --model_dir
was created after running train_meta.py
and the filepath corresponds to the params of the run. This can be done without the RTX gpu.
For this, we will need the tiny-treebanks split for cross-validation. Run python split_files_tiny_auto.py
and it will take care of making the test files.
We run the same command as for validation but without the --validate flag. python metatest_all.py --lr_decoder 0.0001 --lr_bert 1e-04 --updates 20 --support_set_size 20 --optimizer sgd --seed 3 --episode 500 --model_dir saved_models/XMAML_0.0005_5e-05_0.0005_5e-05_20_9999
Need more than 8gb of gpu memory.
You can visualize the gradient conflicts generated in the cos_matrices directory. Use visualize.ipynb to generate conflict graph and epoch level gradient information.