guotong1988/BERT-GPU

multi-gpu pre-training in one machine for BERT from scratch without horovod (Data Parallelism)

PythonApache-2.0

Issues

请问如何进一步实现梯度累计的功能？
#35 opened 2 years ago by 600DZY
1
model_fn should return an EstimatorSpec.
#34 opened 3 years ago by Nanamumuhan
5
During eval, getting "ValueError: model_fn should return an EstimatorSpec". During training, OK
#9 opened 6 years ago by aurotripathy
6
【Try】1-GPU pretrain with big learning rate for 100W-step, then 1-GPU pretrain with small learning rate for another 100W-step.
#28 opened 3 years ago by guotong1988
0
GPT support
#23 opened 3 years ago by guotong1988
0
TensorFlow2 support
#25 opened 3 years ago by guotong1988
0
《How To Pre-train BERT In GPUs》
#27 opened 3 years ago by guotong1988
0
an error like this : Segmentation fault (core dumped),Is the configuration wrong?
#33 opened 3 years ago by Nanamumuhan
9
Should line 74-75 in optimization_gpu.py be comment out?
#32 opened 3 years ago by rsindper
1
wrong when run_pretraining_gpu_v2 with init_checkpoint
#17 opened 6 years ago by ChrisMii
3
关于多GPU训练的一些疑问咨询？
#30 opened 4 years ago by rxc205
13
train 10W steps结束后，do_eva阶段出现错误
#31 opened 4 years ago by rxc205
3
num_train_steps是一块卡还是多块卡的step？
#26 opened 4 years ago by zhengyima
4
train_batch_size and time required to pretrain
#29 opened 4 years ago by Jimojimojimo
2
Slower than single GPU
#3 opened 5 years ago by hankcs
11
The `global_step` update in `optimization_gpu.py` (line 74-75) is redundant.
#24 opened 4 years ago by xuegsh
2
XLNet support
#22 opened 5 years ago by guotong1988
3
模型学不到东西
#11 opened 6 years ago by yumath
17
run_pretraining_gpu.py not working
#16 opened 5 years ago by 652994331
5
Question about "init_checkpoint" and "output_dir" checkpint
#21 opened 5 years ago by shuxiaobo
9
Suffer the Error: tensorflow.python.framework.errors_impl.InvalidArgumentError
#20 opened 5 years ago by shuxiaobo
8
ModuleNotFoundError: No module named 'tensorflow.python.distribute.cross_device_ops
#13 opened 5 years ago by vanpersie32
5
OOM error
#19 opened 5 years ago by yygle
1
Output model files compatible with Official Bert's pre-trained models?
#18 opened 6 years ago by 1e0ng
9
so many bugs in run_pretraining.py and run_pretraining_v2.py
#15 opened 6 years ago by vanpersie32
1
difference between run_pretraining_v2.py with run_pretraining.py
#14 opened 6 years ago by vanpersie32
1
Cannot reload pre-trained model
#12 opened 6 years ago by yick2232
2
ImportError: No module named 'tensorflow.python.distribute.cross_device_ops'
#10 opened 6 years ago by yumath
2
I wonder why is the reshaping necessary?
#7 opened 6 years ago by eduOS
1
ValueError: You must specify an aggregation method to update a MirroredVariable in Tower Context.
#8 opened 6 years ago by nlp4whp
2
experiment result
#6 opened 6 years ago by guotong1988
1
This is just for pretraining BERT?
#1 opened 6 years ago by HaishuoFang
6
loss do not decrease...
#5 opened 6 years ago by guotong1988
1
Can not train on multi-GPUs
#4 opened 6 years ago by andy-yangz
1
运行create_pretraining_data.py报错
#2 opened 6 years ago by sportzhang
2