dhlee347/pytorchic-bert

Pytorch Implementation of Google BERT

PythonApache-2.0

Issues

How much the token-level MLM loss usually is when the bert pre-training stops converging?
#34 opened 3 years ago by MingLunHan
0
Is GEGLU innovative, or is it derived from a certain paper?
#33 opened 4 years ago by takfate
0
some confusions
#32 opened 4 years ago by leileilin
0
Question about running the pretrain.py
#23 opened 5 years ago by littleflow3r
2
Visualizing the attention weights
#31 opened 4 years ago by ahof1704
0
Why is there any need of max_pred in pretraining?
#30 opened 4 years ago by wahab4114
0
Running SQUAD
#28 opened 4 years ago by ismaeel123
3
Does this support multi GPU training?
#29 opened 4 years ago by abhisheksgumadi
2
Usage
#25 opened 5 years ago by JingsenZhang
1
How can I get the replacement of 'books_large_all.txt'?
#26 opened 5 years ago by dodoyeon
0
Masked subword prediction problem
#24 opened 5 years ago by akakakakakaa
0
Nice work!
#4 opened 5 years ago by thomwolf
3
questions for loading the pretrained_model
#21 opened 5 years ago by mingbocui
2
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3793: ordinal not in range(128)
#13 opened 5 years ago by likerainsun
1
Can you please provide books_large_all.txt? And also, the pretrained model uncased_L-12_H-768_A-12/bert_model.ckpt?
#18 opened 5 years ago by AyanKumarBhunia
1
Can you please provide books_large_all.txt?
#17 opened 5 years ago by AyanKumarBhunia
1
the total number of trainable parameters in 12 layer BERT
#20 opened 5 years ago by mingbocui
1
Pretraining with checkpoints
#15 opened 5 years ago by abhi060698
1
Padding bugs on data preprocess
#10 opened 6 years ago by AppleHolic
2
How can we use in on test dataset?
#12 opened 6 years ago by GeetDsa
0
Question About fine-tuning
#9 opened 6 years ago by graykode
1
pretrain for chinese text
#8 opened 6 years ago by Jason-kid
1
any sample dataset for pre-training?
#7 opened 6 years ago by SeekPoint
1
Can you give me some details about files?
#6 opened 6 years ago by hufflepoohpooh
1
h = (scores @ v).transpose(1, 2).contiguous() RuntimeError: CUDA error: out of memory
#5 opened 6 years ago by leerelive
2
Could you add a license file?
#2 opened 6 years ago by theSage21
1
Pretraining data format and possible corner case of seek_random_offset()
#1 opened 6 years ago by L0SG
2