Visual-Attention-Network/SegNeXt

Learning Rate and Batch Size

FabianSchuetze opened this issue · 3 comments

Hi,

thanks for the fantastic work. I am currently trying to train the tiny model from the Imagenet-pretrained weights on the ADE dataset to begin integrating your work into mmsegmentation, as discussed here and here.

However, I am confused about the batch size and learning rate. In the paper, you mention a batch size of 16 and that you use 8 GPUS. However, the config sets samples_per_gpu to 8. Can you kindly tell me what the total batch size used for training should be and the corresponding learning rate?

Best wishes & many thanks,
Fabian

Hi,

The total batch size is 16 for ADE20K dataset and we use 2 GPUs to train SegNeXt on this benchmark.

As @MenghaoGuo said, we train SegNeXt-tiny by 2 GPUs and SegNeXt -large by 4 GPUs. But their learning rate and batch size are the same. Because we don't have so many GPUs.

Thanks for the comments - They were very helpful to me. I will try another training run and report the results.