How to use `l2l.vision.models.ResNet12`?
Jeong-Bin opened this issue · 4 comments
Hi, I'm using l2l to create large MAML model.
However, I have a question regarding the usage of l2l.vision.models.ResNet12
or WRN28
.
I tried the following 3 methods.
# Method 1
class Lambda(nn.Module):
def __init__(self, func):
super().__init__()
self.func = func
def forward(self, x):
return self.func(x)
features = l2l.vision.models.ResNet12(output_size=256)
features = torch.nn.Sequential(features, Lambda(lambda x: x.view(-1, 84)))
features.to(device)
head = torch.nn.Linear(84, ways)
head = l2l.algorithms.MAML(head, lr=fast_lr)
head.to(device)
all_parameters = list(features.parameters()) + list(head.parameters())
optimizer = optim.Adam(all_parameters, meta_lr)
In Method 1
, Error RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of size: : [5]
occurred.
Also, when I modified code to lambda x: x.view(-1, 256)
and torch.nn.Linear(256, ways)
, RuntimeError: mat1 and mat2 shapes cannot be multiplied (1260x84 and 256x5)
occurred.
# Method 2
features = l2l.vision.models.ResNet12(output_size=256)
features = torch.nn.Sequential(features, Lambda(lambda x: x.view(-1, 256)))
features.to(device)
head = l2l.vision.models.MiniImagenetCNN(ways)
head = l2l.algorithms.MAML(head, lr=fast_lr)
head.to(device)
all_parameters = list(features.parameters()) + list(head.parameters())
optimizer = optim.Adam(all_parameters, meta_lr)
Method 2
worked well, but its test accuracy was lower than basic MAML model.
I used following code for basic MAML.
model = l2l.vision.models.MiniImagenetCNN(output_size=ways)
model.to(device)
maml = l2l.algorithms.MAML(model, lr=fast_lr, first_order=False)
optimizer = optim.Adam(maml.parameters(), meta_lr)
# Method 3
model = l2l.vision.models.ResNet12(output_size=ways)
model.to(device)
maml = l2l.algorithms.MAML(model, lr=fast_lr, first_order=False)
optimizer = optim.Adam(maml.parameters(), meta_lr)
Method 3
worked well during training, but I encountered OutOfMemoryError
in testing.
(Actually, the training was very slow.)
What is the right way, and what should I modify?
Or is there any other way to make a large MAML model?
I set the training and testing configurations as follows:
# train setting
ways=5
shot=1
adaptation_steps=5
batch_size=4
meta_lr : 1e-3,
fast_lr : 0.01
# test setting
ways=5
shot=15
adaptation_steps=10
batch_size=4
fast_lr : 0.01
Hello @Jeong-Bin,
Method 3 is correct. Try using maml.clone(first_order=True)
when testing. Or, you can reduce the number of adaptation steps (at the price of performance).
How much GPU memory do you have? If you have more than 1 GPU, you can use model.features = torch.nn.DataParallel(model.features)
to distribute the activations on the GPUs.
@seba-1511
Thanks! My GPU is RTX 3090 Ti with 20GB memory.
I'll try your solution.
Additionally, I referred to 'adaptation steps' in MAML paper.
In section A.1. Classification of the paper, authors use 10 gradient steps at test time.
Is the gradient step means adaptation step?
Yes, gradient steps are adaptation steps.
All right, thank you for your help! Have a nice day😊