WangFeng18/Swin-Transformer

Hardware and training time?

Closed this issue · 2 comments

Great work.

But what's the hardware used to reproduce this? And how long did it take to train?

Thanks

To perfectly reproduce the reported results, using 8xTesla V100 with total batch size of 1024 (or 16 x V100), costing about 40h. We have not try other settings.
However, I believe when using smaller batch size such as 512 or 256, you can still get reasonable results.

Thank you.

That's indeed a fair amount of computation compared to traditional CNN.