How to distill dataset of size exceeding gpu memory size limit
Closed this issue · 4 comments
Hi!
I wonder is there any way to distill much more data exceeding gpu memory size limit? For a large scale dataset or a typical 11G/12G gpu memory size, that can be really useful. At first, I thought state.distributed
in your code is intended for that by putting distilled data into multiple gpus, then I found out I was wrong. It seems that this code only distills data of size fit for one gpu memory size. So, any advice on this matter?
Thanks a lot!
So the torch.distributed code in this repo is used for dataset distillation on multiple networks. Could you please give an example about how to run distributed training? I don't see any in README.
Also, I think splitting data into multiple gpus and computing gradient on each gpu is much more convenient. But that would means building a computation graph across multiple devices and backpropagating in a subgraph on each device. I wonder is that feasible? any advice?
I’m going to close this issue for now. Feel free to open a new one if you have other questions!