ssnl/dataset-distillation

How to distill dataset of size exceeding gpu memory size limit

Closed this issue · 4 comments

zw615 commented

Hi!
I wonder is there any way to distill much more data exceeding gpu memory size limit? For a large scale dataset or a typical 11G/12G gpu memory size, that can be really useful. At first, I thought state.distributed in your code is intended for that by putting distilled data into multiple gpus, then I found out I was wrong. It seems that this code only distills data of size fit for one gpu memory size. So, any advice on this matter?

Thanks a lot!

ssnl commented
zw615 commented

So the torch.distributed code in this repo is used for dataset distillation on multiple networks. Could you please give an example about how to run distributed training? I don't see any in README.

Also, I think splitting data into multiple gpus and computing gradient on each gpu is much more convenient. But that would means building a computation graph across multiple devices and backpropagating in a subgraph on each device. I wonder is that feasible? any advice?

ssnl commented
ssnl commented

I’m going to close this issue for now. Feel free to open a new one if you have other questions!