Hardware and training time?
Closed this issue · 2 comments
whatever60 commented
Great work.
But what's the hardware used to reproduce this? And how long did it take to train?
Thanks
WangFeng18 commented
To perfectly reproduce the reported results, using 8xTesla V100 with total batch size of 1024 (or 16 x V100), costing about 40h. We have not try other settings.
However, I believe when using smaller batch size such as 512 or 256, you can still get reasonable results.
whatever60 commented
Thank you.
That's indeed a fair amount of computation compared to traditional CNN.