The main difference between original bert and bert-muti-gpu are these lines below?
Closed this issue · 10 comments
tf.logging.info("Use normal RunConfig")
dist_strategy = tf.contrib.distribute.MirroredStrategy(
num_gpus=FLAGS.num_gpu_cores,
cross_device_ops=AllReduceCrossDeviceOps('nccl', num_packs=FLAGS.num_gpu_cores),
)
log_every_n_steps = 8
run_config = RunConfig(
train_distribute=dist_strategy,
eval_distribute=dist_strategy,
log_step_count_steps=log_every_n_steps,
model_dir=FLAGS.output_dir,
save_checkpoints_steps=FLAGS.save_checkpoints_steps)
How can I change the original bert for multi-gpu fine-tuning? Thank you!
You should use tf.contrib.distribute.MirroredStrategy
and implement AdamWeightDecayOptimizer
yourself, because the original code implemented by google does not support distributed strategy.
If you are using bert-multi-gpu,
you only need to pass --use_gpu=true
and --num_gpu_cores <GPUs>
to the entry script to enable multi-GPU support.
Thanks a lot for replying!
when I add these lines in run_seq_labeling.py 693-699:
accuracy = tf.metrics.accuracy(label_ids, predictions, output_mask)
loss = tf.metrics.mean(per_example_loss)
return {
"eval_accuracy": accuracy,
"eval_loss": loss,
"precision": tf_metrics.precision(label_ids, predictions, num_labels, positions=positions, average='macro'),
"recall": tf_metrics.recall(label_ids, predictions, num_labels, num_labels, positions=positions,average='macro'),
"f1_score": tf_metrics.f1(label_ids, predictions, num_labels, num_labels, positions=positions, average='macro'),
}
the code for tf_metrics(multiclass f1 score) from https://github.com/guillaumegenthial/tf_metrics/blob/master/tf_metrics/__init__.py
I meeting an error:
TypeError: Fetch argument PerReplica:{
0 /job:localhost/replica:0/task:0/device:GPU:0: <tf.Tensor 'Mean_2:0' shape=() dtype=float32>
1 /job:localhost/replica:0/task:0/device:GPU:1: <tf.Tensor 'replica_1/Mean_2:0' shape=() dtype=float32>} has type <class 'tensorflow.python.distribute.values.PerReplica'>, must be a string or Tensor. (Can not convert a PerReplica into a Tensor or Operation.)
Do you know what's wrong here, is there a better way for multiclass score evaluate?
It seems that MirroredStrategy
is not compatible with tf_metrics
. You can confirm this issue with the author.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
You should use
tf.contrib.distribute.MirroredStrategy
and implementAdamWeightDecayOptimizer
yourself, because the original code implemented by google does not support distributed strategy.
If you are using bert-multi-gpu,
you only need to pass--use_gpu=true
and--num_gpu_cores <GPUs>
to the entry script to enable multi-GPU support.
支持多卡的AdamWeightDecayOptimizer这块代码的改造,有什么教程可以参考吗?如果想换一个Optimizer的话?
You should use
tf.contrib.distribute.MirroredStrategy
and implementAdamWeightDecayOptimizer
yourself, because the original code implemented by google does not support distributed strategy.
If you are using bert-multi-gpu,
you only need to pass--use_gpu=true
and--num_gpu_cores <GPUs>
to the entry script to enable multi-GPU support.支持多卡的AdamWeightDecayOptimizer这块代码的改造,有什么教程可以参考吗?如果想换一个Optimizer的话?
可以参考本项目custom_optimization.py
里的AdamWeightDecayOptimizer
与官方项目中的异同,原来的AdamWeightDecayOptimizer
没有实现分布式的部分。
You should use
tf.contrib.distribute.MirroredStrategy
and implementAdamWeightDecayOptimizer
yourself, because the original code implemented by google does not support distributed strategy.
If you are using bert-multi-gpu,
you only need to pass--use_gpu=true
and--num_gpu_cores <GPUs>
to the entry script to enable multi-GPU support.支持多卡的AdamWeightDecayOptimizer这块代码的改造,有什么教程可以参考吗?如果想换一个Optimizer的话?
可以参考本项目
custom_optimization.py
里的AdamWeightDecayOptimizer
与官方项目中的异同,原来的AdamWeightDecayOptimizer
没有实现分布式的部分。
@haoyuhu 您好,您的项目很赞!
有个问题想请教一下:自定义的AdamWeightDecayOptimizer调用的是原生的apply_gradients(不像bert调用的是自己的apply_gradients,所以没有做global_step+1),所以这里是不是不用+1(https://github.com/HaoyuHu/bert-multi-gpu/blob/master/custom_optimization.py#L104)
You should use
tf.contrib.distribute.MirroredStrategy
and implementAdamWeightDecayOptimizer
yourself, because the original code implemented by google does not support distributed strategy.
If you are using bert-multi-gpu,
you only need to pass--use_gpu=true
and--num_gpu_cores <GPUs>
to the entry script to enable multi-GPU support.支持多卡的AdamWeightDecayOptimizer这块代码的改造,有什么教程可以参考吗?如果想换一个Optimizer的话?
可以参考本项目
custom_optimization.py
里的AdamWeightDecayOptimizer
与官方项目中的异同,原来的AdamWeightDecayOptimizer
没有实现分布式的部分。@haoyuhu 您好,您的项目很赞! 有个问题想请教一下:自定义的AdamWeightDecayOptimizer调用的是原生的apply_gradients(不像bert调用的是自己的apply_gradients,所以没有做global_step+1),所以这里是不是不用+1(https://github.com/HaoyuHu/bert-multi-gpu/blob/master/custom_optimization.py#L104)
应该是需要的,AdamWeightDecayOptimizer中的实现并不是原生apply_gradients。另外,此处考虑了fp16的场景,不收敛时global_step不应递增。