An Empirical Study on Pre-trained Language Models for Neural Keyphrase Generation.
- RNN-based Seq2seq model
- Transformer
For more details, read documentation.
- BERT []
- SciBERT []
- RoBERTa []
Also known as Sequence-to-Sequence PLMs.
- MASS []
- BART []
- ProphetNet []
- UniLM []
- UniLMv2 []
- MiniLM []
- MiniLMv2 []
- bert_uncased_L-2_H-768_A-12
- bert_uncased_L-4_H-768_A-12
- bert_uncased_L-6_H-768_A-12
- bert_uncased_L-8_H-768_A-12
- bert_uncased_L-10_H-768_A-12
- bert_uncased_L-12_H-768_A-12
- bert_uncased_L-6_H-128_A-2
- bert_uncased_L-6_H-256_A-4
- bert_uncased_L-6_H-512_A-8
- bert_uncased_L-6_H-768_A-12
BERT (6L-768H-12A) vs. MASS-base-uncased (6L-768H-12A)
- Both the models uses the same wordpiece vocabuary (of size 30522).
- Both the models are pre-trained on Wikipedia + BookCorpus.
- While BERT has 66M parameters, MASS-base-uncased has 123M parameters.
- If we do not load pre-trained weights, then we can compare the architectures only.
- Transformer Encoder (6L-768H-12A) vs. Transformer (6L-768H-12A)
MiniLMv2 Distilled from Different Teacher Models
- MiniLMv2-[6L-768H]-distilled-from-RoBERTa-large
- MiniLMv2-[6L-768H]-distilled-from-BERT-large
- MiniLMv2-[6L-768H]-distilled-from-BERT-base