bert model use the way of collectiveallreduce to train in multi-gpu
weizhifei12345 opened this issue · 2 comments
weizhifei12345 commented
Dear author:
I find that you have achived the method which is mirrorStrategy to train the bert in multi-gpu.Now,I want to use the way of collectiveallreduce to train in multi-gpu.I use the tf.contrib.distribute.CollectiveAllReduceStrategy(num_gpus_per_worker=2) to set the distributation.Then i use the function of train_and_evaluate to start the process of train,but encounter a problem unsupported operand type(s) for +:'perreplica' and 'str'.I don't know how to solve it.(I have 2 gpus which is V100)
JayYip commented
You may need to provide more information of the problem. Which version of TensorFlow are you using?
weizhifei12345 commented
Thank you for your reply. I have solved this problem. It is my cluster IP that is not set properly.