Redundancy?
rasbt opened this issue · 2 comments
rasbt commented
I noticed we are using
trainable_params = [p for p in model.parameters() if p.requires_grad]
optimizer = AdamW(trainable_params)
vs.
optimizer = AdamW(model.parameters())
but isn't it the same? I.e., the 2nd would internally filter out all those params with requires_grad=False
?
Just curious why it was done this 1st way, was there an efficiency advantage?
Andrei-Aksionov commented
So, what was the answer?
rasbt commented
There shouldn't be a difference :)