/NeuralKpGen

An Empirical Study on Pre-trained Language Models for Neural Keyphrase Generation

Primary LanguagePythonMIT LicenseMIT

Pre-trained Language Models for Keyphrase Generation

An Empirical Study on Pre-trained Language Models for Neural Keyphrase Generation.

Models Trained from Scratch

  • RNN-based Seq2seq model
  • Transformer

For more details, read documentation.

Pre-trained Language Models (PLMs)

Autoencoding PLMs

Autoregressive PLMs

Also known as Sequence-to-Sequence PLMs.

Autoencoding and Autoregressive PLMs

Compressed Pre-trained PLMs

Neural Network (NN) Architecture of PLMs

Shallow vs. Deeper Architectures

Wide vs. Narrow Architectures

Encoder-only vs. Encoder-Decoder Architectures

BERT (6L-768H-12A) vs. MASS-base-uncased (6L-768H-12A)

  • Both the models uses the same wordpiece vocabuary (of size 30522).
  • Both the models are pre-trained on Wikipedia + BookCorpus.
  • While BERT has 66M parameters, MASS-base-uncased has 123M parameters.
  • If we do not load pre-trained weights, then we can compare the architectures only.
    • Transformer Encoder (6L-768H-12A) vs. Transformer (6L-768H-12A)

PLMs as Teacher for Knowledge Distillation

MiniLMv2 Distilled from Different Teacher Models

  • MiniLMv2-[6L-768H]-distilled-from-RoBERTa-large
  • MiniLMv2-[6L-768H]-distilled-from-BERT-large
  • MiniLMv2-[6L-768H]-distilled-from-BERT-base