facebookresearch/fairscale

FSDP on model that has requires_grad = false

andrasiani opened this issue · 1 comments

Is it possible to wrap a teacher model in FSDP that is in eval mode and has requires_grad=False?
My setup is knowledge distillation between a huge teacher and a small student. I`d like to only partition weights of the teacher across 6 gpus, or both teacher and student.

Yes, it is possible but with certain limitations hard to describe exactly. You can check the tests dir to see some examples. In generally mixing and requires_grad and not requires_grad params within a fsdp wrapper is likely not a good idea.