BERT implementation pytorch code
Embedding
- Positional Encoding
- Word Embedding
- Segment Embedding
Transformer Encoder
- Multi-head Attention
- Position-wise Fead Forward Network
- ResNet + NormLayer
Pre-train for Two Task
- Next Sentence Prediction
- Masked Language Model
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
(https://arxiv.org/abs/1810.04805)