
papers, learning notes and practice in NLP

Primary LanguagePython

from kaggle


papers read & learning notes & some code


Model Title Resources Remarks
Word2Vec Efficient Estimation of Word Representations in Vector Space [paper] ------
negative sampling Distributed Representations of Words and Phrases and their Compositionality [paper] ------
Transformer Attention Is All You Need [paper] Google2017
Bert Pre-training of Deep Bidirectional Transformers for Language Understanding [paper] Google2018


【斯坦福CS224N学习笔记】01-Introduction and Word Vectors

Text Classification

  • Utils
    • generate_w2v: train word embedding using gensim.
    • data_helper: load datasets and data clearning, split to train and valid data.
  • BaseModel: a base model, including parameters initialization, embedding initialization, loss function and accuracy, some base api like compile, fit and predict. etc.
  • FastText
  • TextCNN
  • TextRNN
  • TextBiLSTM
  • TextRCNN
  • HAN
  • BiLSTM+Attention
  • Transformer
  • ...


  • Bert+CRF
  • Bert+BiLSTM+CRF

Content Embedding

  • Bert-Whitening
  • Sentence-Bert
  • SimCSE
  • ESimCSE

Text Matching

  • Siamese LSTM
  • DSSM
  • ESIM
  • DIIN

Text Generation

  • [ ]


  • ONNX (OnnxRuntime by CPP)
  • TensorRT