Redundancy?

Question

rasbt opened this issue 4 months ago · 2 comments

I noticed we are using

trainable_params = [p for p in model.parameters() if p.requires_grad]
optimizer = AdamW(trainable_params)

vs.

optimizer = AdamW(model.parameters())

but isn't it the same? I.e., the 2nd would internally filter out all those params with requires_grad=False?

Just curious why it was done this 1st way, was there an efficiency advantage?

Answer 1 · 2024-05-10T18:45:45.000Z

So, what was the answer?

Answer 2 · 2024-05-10T20:28:07.000Z

There shouldn't be a difference :)