Learning Rate and Batch Size

Question

Learning Rate and Batch Size

FabianSchuetze opened this issue 2 years ago · 3 comments

Hi,

thanks for the fantastic work. I am currently trying to train the tiny model from the Imagenet-pretrained weights on the ADE dataset to begin integrating your work into mmsegmentation, as discussed here and here.

However, I am confused about the batch size and learning rate. In the paper, you mention a batch size of 16 and that you use 8 GPUS. However, the config sets samples_per_gpu to 8. Can you kindly tell me what the total batch size used for training should be and the corresponding learning rate?

Best wishes & many thanks,
Fabian

Answer 1 · 2022-11-02T11:58:31.000Z

Hi,

The total batch size is 16 for ADE20K dataset and we use 2 GPUs to train SegNeXt on this benchmark.

Answer 2 · 2022-11-02T12:19:59.000Z

As @MenghaoGuo said, we train SegNeXt-tiny by 2 GPUs and SegNeXt -large by 4 GPUs. But their learning rate and batch size are the same. Because we don't have so many GPUs.

Answer 3 · 2022-11-04T08:41:28.000Z

Thanks for the comments - They were very helpful to me. I will try another training run and report the results.