/NLP-Space

papers, learning notes and practice in NLP

Primary LanguagePython

from kaggle

NLP-Space

papers read & learning notes & some code


Papers

Model Title Resources Remarks
Word2Vec Efficient Estimation of Word Representations in Vector Space [paper] ------
negative sampling Distributed Representations of Words and Phrases and their Compositionality [paper] ------
Transformer Attention Is All You Need [paper] Google2017
Bert Pre-training of Deep Bidirectional Transformers for Language Understanding [paper] Google2018

Learning-Notes

【斯坦福CS224N学习笔记】01-Introduction and Word Vectors
Word2Vec学习笔记(SVD、原理推导)

Text Classification

  • Utils
    • generate_w2v: train word embedding using gensim.
    • data_helper: load datasets and data clearning, split to train and valid data.
  • BaseModel: a base model, including parameters initialization, embedding initialization, loss function and accuracy, some base api like compile, fit and predict. etc.
  • FastText
  • TextCNN
  • TextRNN
  • TextBiLSTM
  • TextRCNN
  • HAN
  • BiLSTM+Attention
  • Transformer
  • ...

NER

  • BiLSTM+CRF
  • Bert+CRF
  • Bert+BiLSTM+CRF

Content Embedding

  • Bert-Whitening
  • Sentence-Bert
  • SimCSE
  • ESimCSE

Text Matching

  • Siamese LSTM
  • DSSM
  • ESIM
  • DIIN

Text Generation

  • [ ]

Inference

  • ONNX (OnnxRuntime by CPP)
  • TensorRT