ContinueTrainingBERT

Continue Training BERT with transformers
Continue Training BERT in the vertical field
This repository is just a simple example of bert pre-training

🎉Welcome everyone to improve this repository with me 🎉

load pretrained weight
continue training
- Using transformers DataCollator class
- Using transformers Tokenizer class
- Using transformers Model class
- Using transformers Trainer class
Implement tokenizer class
Implement bert model structure (class)
- Implement bert embedding、encoder and pooler structure

pip install transormers

NOTICE : Your data should be prepared with two sentences in one line with tab(\t) separator

This is the first sentence. \t This is the second sentence.\n
Continue Training \t BERT with transformers\n

python main.py

inputs
- input_ids # [sentence0, sentence1] the original index based on the tokenizer
- token_type_ids # [0, 1] zero represent sentence0
- attention_mask # [1, 1] The areas that have been padded will be set to 0
- labels # [....] masked, real index
- next_sentence_label # [0 or 1] zero represent sentence0 and sentence1 have no contextual relationship
- ...
outputs
- loss # masked_lm_loss + next_sentence_loss, predict masked loss and next sentence loss
- prediction_logits
- seq_relationship_logits
- ...