taoyang1122/adapt-image-models

It seems that the training memory is not reduced

casillas1111 opened this issue · 1 comments

Thank you for your excellent work.

After using the Adapter, I only passed the Adapter parameters to the optimizer. However, the training memory did not go down. I verified that the rest of the Transformer parameters were set to requires_grad = False. The code is as follows:

for name, param in model.named_parameters():
if "Adapter" in name:
param.requires_grad = True
else:
param.requires_grad = False
optimizer = torch.optim.AdamW(lr=3e-4 params = filter(lambda p: p.requires_grad, model.parameters()),
weight_decay=0.05,)

Looking forward to your reply.

Hi, you can unfreeze the other model parameters by commenting https://github.com/taoyang1122/adapt-image-models/blob/main/tools/train.py#L187-L189. Then you can compare the memory cost. The memory saving is not as much as the number of parameters.