FSDP on model that has requires_grad = false
andrasiani opened this issue · 1 comments
andrasiani commented
Is it possible to wrap a teacher model in FSDP that is in eval mode and has requires_grad=False?
My setup is knowledge distillation between a huge teacher and a small student. I`d like to only partition weights of the teacher across 6 gpus, or both teacher and student.
min-xu-ai commented
Yes, it is possible but with certain limitations hard to describe exactly. You can check the tests dir to see some examples. In generally mixing and requires_grad and not requires_grad params within a fsdp wrapper is likely not a good idea.