This is my implementation of Google AI's BERT model (paper), with the specific use case of Question-Answering in mind. The official repository is here - I intend to develop a better grasp of Tensorflow and the different practices of training LM's introduced in the paper.
- Embedding
- Token embeddings: WordPiece
- Segment embeddings
- Position embeddings
- Encoder
- Stacked Transformer Encoders
- Self-attention
- Feed-forward network
- Layernormalization
- Residual sublayer connection
- Stacked Transformer Encoders
- Masked LM
- Sentence Prediction
- SQuAD