Code for the NAACL 2018 paper "Improving Implicit Discourse Relation Classification by Modeling Inter-dependencies of Discourse Units in a Paragraph"
@inproceedings{dai2018improving,
title={Improving Implicit Discourse Relation Classification by Modeling Inter-dependencies of Discourse Units in a Paragraph},
author={Dai, Zeyu and Huang, Ruihong},
booktitle={Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
volume={1},
pages={141--151},
year={2018}
}
To run the code:
- Download preprocessed pdtb v2.0 data in .pt format (All the Words/POS/NER/label(both implict&explicit) and discourse unit (DU) boundary information are already transformed to Pytorch vector format) and put it in folder ./data/
- For the model without CRF, run
python run_discourse_parsing.py
- For the model with CRF, run
python run_CRF_discourse_parsing.py
- For binary classification, run
python run_binary_target_discourse_parsing.py
- You can change the hyperparameters in .py file before the main() function (I am sorry that I didn't write code for config). Feel free to contact me if you need pretrained model file.
About preprocessing:
- Download both Google word2vec and preprocessed POS/NER file (You can also generate them by yourself by downloading Standford CoreNLP toolkit and put them in ./data/resource)
- The PDTB v2.0 dataset raw files are already in the ./data/preprocess/dataset/
- run
python pdtb_preprocess_moreexpimp_paragraph.py
Package version:
python == 2.7.10
torch == 0.3.0
nltk >= 3.2.2
gensim >= 0.13.2
numpy >= 1.13.1