This is fantastic, great work! Just to be clear...
slerman12 opened this issue · 1 comments
slerman12 commented
Just making sure, this lazy wrapper somehow divvies up the computations per GPU budget, right? it doesn't just... sub-sample a smaller batch and ignore the remainder, right?
rentruewang commented
Haha, no it doesn't. It divides the original batch into smaller batches and accumulates gradients accordingly, so it should effectively be the same as the computing the original. There seems to be a bug #16, however, I'm a little busy this week to really look into what happened.
As for devices, currently it only works on a single GPU (as a proof of concept), I'm still figuring out a way for it to work across multiple GPUs.