Lightning-AI/litgpt

Redundancy?

rasbt opened this issue · 2 comments

I noticed we are using

trainable_params = [p for p in model.parameters() if p.requires_grad]
optimizer = AdamW(trainable_params)

vs.

optimizer = AdamW(model.parameters())

but isn't it the same? I.e., the 2nd would internally filter out all those params with requires_grad=False?

Just curious why it was done this 1st way, was there an efficiency advantage?

So, what was the answer?

There shouldn't be a difference :)