distributed training not good?

Question

distributed training not good?

harryxu-yscz opened this issue 6 years ago · 4 comments

I had some OK results training with one GPU but not with two GPUs (with batch_size=50). Specifically, the attention mask would not change and the generator ended up returning the original image. I tried scaling lr_G, lr_D and lambda_D_cond but no luck.

Any suggestion?

Answer 1 · 2019-01-28T20:45:14.000Z

problem solved by properly scaling parameters

Answer 2 · 2019-04-06T10:45:59.000Z

@harryxu-yscz Hi, I met the same question as you. Would you please share me your experience? Thank you very much!

Answer 3 · 2019-04-06T21:53:41.000Z

@joyyang1997
Try tuning these params for multi-GPU code:
lambda_mask_smooth, lambda_D_cond

e.g. I used these params:

--batch_size 50 --gpu_ids 0,1 --lambda_D_cond 8000 \
--lambda_mask_smooth 5e-6

Answer 4 · 2019-04-07T01:44:33.000Z

@joyyang1997
Try tuning these params for multi-GPU code:
lambda_mask_smooth, lambda_D_cond

e.g. I used these params:
--batch_size 50 --gpu_ids 0,1 --lambda_D_cond 8000 \
--lambda_mask_smooth 5e-6

Thank you very much, I'll try it.