hushell/pmf_cvpr22

Several code and traning questions

cszer opened this issue · 6 comments

cszer commented

Hi, thanks for awesome and useful paper.
I am trying to train other backbone on 10 meta-dataset
My questions are:

  1. What about batch size? For vit it is ok to use bs=1 due to layer norm and lagre memory consumption. Will this degrade teoretical performance of cnns with simple batch norm layers?
  2. Ram usage is increasing over training steps , is it normal?

Hi @cszer, thanks for your questions!

  1. bs=1 means we only have 1 episode at a time, which contains a support set of images and a query set of images. So the input to ViT is never bs=1, but bs=len(support set) or len(query set).
  2. RAM usage is changing all the time due to different support/query set sizes.
cszer commented

About ram, I add some changes to dataset (close all h5 using tables after training tensors are created + deepcopy to some variables) -> ram usage now not more than 29 gb (before it was around 45)

cszer commented

Thanks for answers!

That sounds a good fix. Could you send a pull request so that I can merge your code? Cheers!

I encountered the same problem, the program terminated because the cpu was running out of RAM, how can I solve it?

@codeshop715 Unfortunately the ViT models were not optimized and it requires a 48G GPU for training Meta-Dataset. There is a trick to reduce memory is stopping grad on the ViT for Support Set.