/Dictprompt

code and dataset of DictPrompt

Primary LanguagePythonMIT LicenseMIT

DictPrompt

Implementation of paper: DictPrompt: Comprehensive Dictionary-integrated Prompt Tuning for Pre-trained Language Model

TL;DR

We propose a generalized dictionary-based additional pre-training method using the prompt learning method.

Abstract

The textual semantics contained in the PLM (Pre-trained Language Model) is constrained by the text distribution in the original training corpus. Due to the lack of sufficient contextual training corpus, the low-frequency word representations in the PLM often have difficulty capturing their actual semantics. Previous research has shown that using semantic information from dictionaries can alleviate this problem. Unfortunately, these works neglected the infinite potential of example sentences from different target words with various meanings. To re-explore methods for enhancing PLM using the dictionary, we propose a novel Comprehensive Dictionary-based tuning approach integrating the latest Prompt learning (DictPrompt). We first collect a dataset based on the Oxford Advanced Learner’s English Dictionary. Then, we designed a set of comprehensive prompt templates with the corpus combining the word, the definition, and its example sentence. Finally, we insert a word game training task between pre-training and fine-tuning using these templates, allowing the model to inject more semantic information into PLM. We test our Dictprompt tuning method on three commonly used PLMs. The testing results on five fine-grained semantic tasks show that our dictionary-based secondary tuning can bring additional gains to the model's performance. The best accuracy improves $3.09%$ on average with our tuning on the WiC task and $7.93%$ on the WSC task. We also plot the sentence embedding scatters of polysemy words. Our method can smooth the decision boundary and help the model output more distinguishable embedding.

Pipeline

An illustration of DictPrompt pipeline. DictPrompt pipeline

Results in our paper:

Besides comparing the model after DictPrompt tuning with the vanilla models, we also compare with the following related PLM optimizing works: (1) SemPre1, (2) ElitePLM2, (3) GenDef3, (4) Knowledge-BERT4, (5) P-tuning5, (6) P-tuning V26.

Model WiC WSC SM SMR HS
BERT-base vanilla (our) (avg.) 69.75 63.46 87.21 83.72 39.17
GenDef 68.64 --- --- --- ---
ElitePLM 69.90 60.60 87.30 80.10 40.50
SemPre (our) (avg.) 69.91 65.39 87.31 81.82 39.19
our+all words (best) 71.79 68.27 89.40 85.20 39.44
our+all words (avg.) 70.53 66.35 87.92 84.38 39.26
our+rare words (avg.) 70.46 66.35 87.53 83.12 38.27
BERT-large SuperGLUE reported 69.50 64.30 --- --- ---
vanilla (our) (avg.) 71.00 68.27 89.67 86.88 43.81
ElitePLM 72.70 63.50 89.40 85.80 47.30
SemPre (our)(avg.) 69.04 70.67 87.17 85.90 43.73
Knowledge-BERT 70.90 --- --- --- ---
P-tuning 63.00 64.40 --- --- ---
P-tuning V2 75.10 68.30 --- --- ---
our+all words (best) 73.98 76.92 90.80 88.80 45.23
our+all words (avg.) 72.10 71.15 90.55 88.23 44.19
our+rare words (avg.) 71.79 71.15 88.97 88.23 44.19
Roberta-base vanilla (from SemPre) 69.40 62.50 75.00 40.00 41.30
vanilla (our) (avg.) 70.93 66.35 87.70 88.70 62.10
ElitePLM 67.20 71.10 89.30 87.50 61.00
SemPre 71.50 63.60 77.70 39.50 41.70
SemPre (our) (avg.) 69.51 64.42 87.80 88.73 62.41
our+all words (best) 72.88 73.08 88.90 89.50 62.68
our+all words (avg.) 71.47 67.31 87.80 88.98 62.29
our+rare words (avg.) 70.38 66.35 87.80 88.40 62.10
Roberta-large vanilla (from SemPre) 74.60 69.30 79.20 47.30 48.90
vanilla (our) (avg.) 71.55 70.67 93.72 92.23 81.26
ElitePLM 72.70 75.00 93.00 92.30 85.20
SemPre 75.70 73.50 80.40 48.40 75.90
SemPre (our) (avg.) 71.55 73.08 93.66 92.35 81.27
P-tuning 56.90 64.40 --- --- ---
P-tuning V2 73.40 63.50 --- --- ---
our+all words (best) 75.86 81.73 95.30 93.70 81.96
our+all words (avg.) 74.61 76.92 94.05 93.08 81.35
our+rare words (avg.) 73.28 76.92 93.85 92.39 81.27
T5-base vanilla (our) 68.73 62.50 86.74 86.60 55.87
ElitePLM 68.30 79.80 89.20 82.90 55.20
SemPre (our) 68.42 62.02 86.58 85.36 56.53
our+all words(best) 72.41 71.15 88.30 87.90 57.84
our+all words(avg.) 70.14 64.90 87.53 86.90 57.63
our+rare words 70.06 63.94 87.04 86.90 57.63
T5-large vanilla (our) 71.32 69.23 92.48 91.23 82.85
ElitePLM 69.30 84.60 92.70 88.20 79.10
SemPre (our) 71.32 68.27 92.69 90.48 82.92
our+all words(best) 74.92 76.92 93.60 92.10 83.21
our+all words(avg.) 72.57 72.12 92.73 91.39 82.99
our+rare words 72.10 72.12 92.73 91.32 82.96

Comparable results of proposed DictPrompt with baselines on five downstream tasks. The bold represents the best result, and the underlined represents the second best result.

Requirement

You may need the following environment dependencies:

  • python: 3.9.0
  • pytorcch:1.8.2
  • transformers:4.20.0
  • datasets:1.18.4
  • scikit-learn:1.0.1
  • numpy:1.21.4
  • jsonlines:3.0.0

References

1. Ren X, Sun X, Wang H, et al. Towards semantics-enhanced pre-training: Can lexicon definitions help learning sentence meanings?[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(15): 13736-13744.

2. Li J, Tang T, Gong Z, et al. ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models[J]. arXiv preprint arXiv:2205.01523, 2022.

3. Chang T Y, Chen Y N. What does this word mean? explaining contextualized embeddings with natural language definition[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019: 6064-6070.

4. Peters M E, Neumann M, Logan R, et al. Knowledge Enhanced Contextual Word Representations[C]//Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.

5. Liu X, Zheng Y, Du Z, et al. GPT understands, too[J]. arXiv preprint arXiv:2103.10385, 2021.

6. Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68, Dublin, Ireland. Association for Computational Linguistics.

Citiation

Please cite the paper if helps.

@article{cao2023dictprompt,
  title={DictPrompt: Comprehensive dictionary-integrated prompt tuning for pre-trained language model},
  author={Cao, Rui and Wang, Yihao and Gao, Ling and Yang, Meng},
  journal={Knowledge-Based Systems},
  pages={110605},
  year={2023},
  publisher={Elsevier}
}