/BERT-1

Tensorflow implementation of BERT for QA

Primary LanguagePython

BERT: Bidirectional Embedding Representations from Transformers

This is my implementation of Google AI's BERT model (paper), with the specific use case of Question-Answering in mind. The official repository is here - I intend to develop a better grasp of Tensorflow and the different practices of training LM's introduced in the paper.

Architecture

  • Embedding
    • Token embeddings: WordPiece
    • Segment embeddings
    • Position embeddings
  • Encoder
    • Stacked Transformer Encoders
      • Self-attention
      • Feed-forward network
      • Layernormalization
      • Residual sublayer connection

Pre-Training

  • Masked LM
  • Sentence Prediction

Task Fine-tuning

  • SQuAD