This code was used to obtain the results described in the paper:
Cooperative Learning of Disjoint Syntax and Semantics
Serhii Havrylov,
Germán Kruszewski,
Armand Joulin
Presented at NAACL2019
- Download ListOps dataset. URLs of the original dataset and an extrapolation test set can be found in
data/listops/external/urls.txt
file. - Run
python listops/data_preprocessing/split.py
to split the dataset into the train, the valid and the test sets. Make sure that you have downloaded the dataset and the files are present in thedata/listops/external
folder. - Build vocabulary using
python listops/data_preprocessing/build_vocab.py
. - Run
python listops/ppo/train_ppo_model.py
orpython listops/reinforce/train_reinforce_model.py
to train the model with PPO or REINFORCE estimators.
- Run
python sst/ppo/train_ppo_model.py
to train the model using SST-2 or SST-5 datasets.
- Download SNLI and MultiNLI datasets and extract corresponding archives to
data/nli
folder. URLs can be found indata/nli/external/urls.txt
file. - Run
python nli/data_preprocessing/preprocess.py
to preprocess dataset and generate vocabulary files. - Run
python nli/ppo/train_ppo_model.py
to train the model using SNLI or MultiNLI datasets.
The code is tested with Python 3.6.3 and PyTorch 1.0.1.
Latent-TreeLSTM is MIT licensed. See the LICENSE file for details.