NLP

Distributed Representation

Neural Language Model

A Neural Probabilistic Language Model cdsn

Pretrained Word Embedding

Word2Vec

  1. from gensim.models import Word2Vec
  2. Distributed Representations of Words and Phrases and Their Compositionality

GloVe

  1. Source Code
  2. GloVe: Global Vectors for Word Representation

Pretrained Language Model

ELMo

Paper: Deep contextualized word representations

  1. Bidirectional language model
  2. residual connection
  3. character-level embedding

GPT

Improving Language Understanding by Generative Pre-Training github

  1. left to right
  2. transformer encoder for finetuning, multi-layer decoder for language model
  3. BPE(byte pair encoding) token

Bert

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding github

  1. bidirectional LM (mask)
  2. word-level token
  3. LM Multi-task with Next Sentence Predicition

GPT2

Language Models are Unsupervised Multitask Learners github

  1. left-to-right
  2. larger and deeper than gpt
  3. BPE token
  4. few modification in Transformer (position of Layer-Normalization .etc)

XLNet

XLNet: Generalized Autoregressive Pretraining for Language Understanding

  1. Bidirectional LM by Random Permutation instead of Mask (better for text generation task, consistent in train and test)
  2. Two different Attention
  3. Delete NSE

RoBerta

RoBERTa: A Robustly Optimized BERT Pretraining Approach
toolkit:FAIRSEQ: A Fast, Extensible Toolkit for Sequence Modeling github

  1. Bidirectional LM with different mask in various epoch (more robust than origin mask method of bert)
  2. Experiment on NSE with various kind of input and objectives. And prove that removing NSE loss could slightly improve performance of downstream tasks.

KnowBert

Knowledge Enhanced Contextual Word Representations TODO

Classification

Tasks

  1. Sentimental Classification
  2. Text Classification
  3. Textual entailment
  4. Paraphrase Identification (detection)

Convolution Neural Network in NLP

N-gram feature of tokens (word/character)
Paper: Convolutional Neural Networks for Sentence Classification

Recurrent Neural Network in NLP

Long-term dependency feature in Text
Paper: Recurrent Convolutional Neural Networks for Text Classification

CNN + RNN

N-gram feature of tokens + Contextual feature
Paper:A C-LSTM Neural Network for Text Classification

ELMo (BiRNN) => Pretrained Feature

Deep contextualized word representations

Transformer

Sequence2Sequence (Encoder-Decoder)

Tasks

  1. Machine Translation [Statistical MT] [Neural MT]
  2. Text Summarization
  3. Grammar Error Correction
  4. Question & Answer ( Q&A)

Models

Seq2Seq Based on RNN

Attention

Tensor2Tensor Based on Transformer

GPT (pretrained + finetune)

Bert (pretrained + finetune)

NLP Fundemental Tasks

Chinese Word Segmentation

Part-of-Speech Tagging

Parsing

  1. Dependency Parsing
  2. Constituency Parsing

Semantic Role Labeling

Named Entity Recognition

Paraphrase Identification

.etc