Several code and traning questions

Question

Several code and traning questions

cszer opened this issue 2 years ago · 6 comments

cszer commented 2 years ago

Hi, thanks for awesome and useful paper.
I am trying to train other backbone on 10 meta-dataset
My questions are:

What about batch size? For vit it is ok to use bs=1 due to layer norm and lagre memory consumption. Will this degrade teoretical performance of cnns with simple batch norm layers?
Ram usage is increasing over training steps , is it normal?

Answer 1 · 2023-03-15T21:58:53.000Z

Hi @cszer, thanks for your questions!

bs=1 means we only have 1 episode at a time, which contains a support set of images and a query set of images. So the input to ViT is never bs=1, but bs=len(support set) or len(query set).
RAM usage is changing all the time due to different support/query set sizes.

Answer 2 · 2023-03-16T10:05:32.000Z

About ram, I add some changes to dataset (close all h5 using tables after training tensors are created + deepcopy to some variables) -> ram usage now not more than 29 gb (before it was around 45)

Answer 3 · 2023-03-16T10:07:43.000Z

Thanks for answers!

Answer 4 · 2023-03-16T10:36:32.000Z

That sounds a good fix. Could you send a pull request so that I can merge your code? Cheers!

Answer 5 · 2023-05-06T12:17:33.000Z

I encountered the same problem, the program terminated because the cpu was running out of RAM, how can I solve it?

Answer 6 · 2023-06-14T23:46:54.000Z

@codeshop715 Unfortunately the ViT models were not optimized and it requires a 48G GPU for training Meta-Dataset. There is a trick to reduce memory is stopping grad on the ViT for Support Set.