Issues
- 0
How much the token-level MLM loss usually is when the bert pre-training stops converging?
#34 opened by MingLunHan - 0
- 0
some confusions
#32 opened by leileilin - 2
Question about running the pretrain.py
#23 opened by littleflow3r - 0
Visualizing the attention weights
#31 opened by ahof1704 - 0
- 3
Running SQUAD
#28 opened by ismaeel123 - 2
Does this support multi GPU training?
#29 opened by abhisheksgumadi - 1
Usage
#25 opened by JingsenZhang - 0
- 0
Masked subword prediction problem
#24 opened by akakakakakaa - 3
Nice work!
#4 opened by thomwolf - 2
questions for loading the pretrained_model
#21 opened by mingbocui - 1
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3793: ordinal not in range(128)
#13 opened by likerainsun - 1
Can you please provide books_large_all.txt? And also, the pretrained model uncased_L-12_H-768_A-12/bert_model.ckpt?
#18 opened by AyanKumarBhunia - 1
Can you please provide books_large_all.txt?
#17 opened by AyanKumarBhunia - 1
- 1
Pretraining with checkpoints
#15 opened by abhi060698 - 2
Padding bugs on data preprocess
#10 opened by AppleHolic - 0
How can we use in on test dataset?
#12 opened by GeetDsa - 1
Question About fine-tuning
#9 opened by graykode - 1
pretrain for chinese text
#8 opened by Jason-kid - 1
any sample dataset for pre-training?
#7 opened by SeekPoint - 1
- 2
h = (scores @ v).transpose(1, 2).contiguous() RuntimeError: CUDA error: out of memory
#5 opened by leerelive - 1
Could you add a license file?
#2 opened by theSage21 - 2