Several code and traning questions
cszer opened this issue · 6 comments
Hi, thanks for awesome and useful paper.
I am trying to train other backbone on 10 meta-dataset
My questions are:
- What about batch size? For vit it is ok to use bs=1 due to layer norm and lagre memory consumption. Will this degrade teoretical performance of cnns with simple batch norm layers?
- Ram usage is increasing over training steps , is it normal?
Hi @cszer, thanks for your questions!
- bs=1 means we only have 1 episode at a time, which contains a support set of images and a query set of images. So the input to ViT is never bs=1, but bs=len(support set) or len(query set).
- RAM usage is changing all the time due to different support/query set sizes.
About ram, I add some changes to dataset (close all h5 using tables after training tensors are created + deepcopy to some variables) -> ram usage now not more than 29 gb (before it was around 45)
Thanks for answers!
That sounds a good fix. Could you send a pull request so that I can merge your code? Cheers!
I encountered the same problem, the program terminated because the cpu was running out of RAM, how can I solve it?
@codeshop715 Unfortunately the ViT models were not optimized and it requires a 48G GPU for training Meta-Dataset. There is a trick to reduce memory is stopping grad on the ViT for Support Set.