Multitasking Framework for Unsupervised Simple Definition Generation

Source code for the paper Multitasking Framework for Unsupervised Simple Definition Generation published on ACL 2022.

Requirements

Training Environment

Pytorch
fairseq
blingfire

In order to install them, you can run this command:

pip install -r requirements-train.txt

Evaluation Environment

Pytorch
Sentence-Transformers
Jieba
NLTK
Pandas
scipy
xlrd
EASSE

In order to install them, you can run this command:

pip install -r requirements-eval.txt
git clone https://github.com/feralvam/easse.git
cd easse
pip install .

Usage

All data including the Chinese and English DG dataset, and the simple text corpora mentioned in the paper have been placed in the folder "data".
Please download the pretrained model parameters of MASS from [en|zh], unzip it, and put the unzipped files into the folder "pretrained_model/MASS" and "pretrained_model/MASS-zh" respectively.
To preprocess the dataset, please run the following command:

bash run/data_process.sh #for English
# or
bash run/data_process_zh.sh # for Chinese

To train a SimpDefiner that can simultaneously generated complex and simple definitions, you can run the following command:

bash run/train_oxford_oald_multi_task.sh # for English
# or
bash run/train_cwn_textbook_multi_task.sh # for Chinese

Model checkpoints will be saved in a checkpoint dir.

If you want to evaluate the trained model and generate definitions (both complex and simple) using this model, please run the following command:

bash run/evaluate_oxford_oald.sh --model_dir [model-dir] # for English
# or
bash run/evaluate_cwn_textbook.sh --model_dir [model-dir] # for Chinese

The generated definitions will be saved in the same checkpoint dir.

If you want to run automatic metrics for the generated definitions, please run the following command:

bash metrics/calc_metrics.sh [model-dir] [oxford|oald|cwn|textbook] [GPU_ID]

Cite

@inproceedings{kong-etal-2022-simpdefiner,
    title = "Multitasking Framework for Unsupervised Simple Definition Generation",
    author = "Kong, Cunliang and
      Chen, Yun and
      Zhang, Hengyuan and
      Yang, Liner and
      Yang, Erhong",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
    year = "2022"
}

Contact

If you have questions, suggestions or bug reports, please email cunliang.kong@outlook.com

blcuicall/SimpDefiner