guotong1988/BERT-GPU
multi-gpu pre-training in one machine for BERT from scratch without horovod (Data Parallelism)
PythonApache-2.0
Issues
- 1
请问如何进一步实现梯度累计的功能?
#35 opened by 600DZY - 5
model_fn should return an EstimatorSpec.
#34 opened by Nanamumuhan - 6
During eval, getting "ValueError: model_fn should return an EstimatorSpec". During training, OK
#9 opened by aurotripathy - 0
【Try】1-GPU pretrain with big learning rate for 100W-step, then 1-GPU pretrain with small learning rate for another 100W-step.
#28 opened by guotong1988 - 0
GPT support
#23 opened by guotong1988 - 0
TensorFlow2 support
#25 opened by guotong1988 - 0
《How To Pre-train BERT In GPUs》
#27 opened by guotong1988 - 9
an error like this : Segmentation fault (core dumped),Is the configuration wrong?
#33 opened by Nanamumuhan - 1
- 3
- 13
关于多GPU训练的一些疑问咨询?
#30 opened by rxc205 - 3
train 10W steps结束后,do_eva阶段出现错误
#31 opened by rxc205 - 4
num_train_steps是一块卡还是多块卡的step?
#26 opened by zhengyima - 2
- 11
Slower than single GPU
#3 opened by hankcs - 2
- 3
XLNet support
#22 opened by guotong1988 - 17
- 5
run_pretraining_gpu.py not working
#16 opened by 652994331 - 9
- 8
Suffer the Error: tensorflow.python.framework.errors_impl.InvalidArgumentError
#20 opened by shuxiaobo - 5
ModuleNotFoundError: No module named 'tensorflow.python.distribute.cross_device_ops
#13 opened by vanpersie32 - 1
- 9
- 1
- 1
- 2
Cannot reload pre-trained model
#12 opened by yick2232 - 2
- 1
I wonder why is the reshaping necessary?
#7 opened by eduOS - 2
ValueError: You must specify an aggregation method to update a MirroredVariable in Tower Context.
#8 opened by nlp4whp - 1
experiment result
#6 opened by guotong1988 - 6
This is just for pretraining BERT?
#1 opened by HaishuoFang - 1
loss do not decrease...
#5 opened by guotong1988 - 1
Can not train on multi-GPUs
#4 opened by andy-yangz - 2
运行create_pretraining_data.py报错
#2 opened by sportzhang