Address frozen parameter warning with FSDP on nightly torch

Question

Address frozen parameter warning with FSDP on nightly torch

Opened this issue 4 months ago · 2 comments

PEFT finetuning (LoRA, adapter) raises the following warning for each FSDP-wrapped layer (transformer block in our case):

The following parameters have requires_grad=True:
['transformer.h.0.attn.attn.lora_A', 'transformer.h.0.attn.attn.lora_B']
The following parameters have requires_grad=False:
['transformer.h.0.norm_1.weight', 'transformer.h.0.norm_1.bias', 'transformer.h.0.norm_2.weight', 'transformer.h.0.norm_2.bias', 'transformer.h.0.attn.attn.linear.weight', 'transformer.h.0.attn.attn.linear.bias', 'transformer.h.0.attn.proj.linear.weight', 'transformer.h.0.attn.proj.linear.bias', 'transformer.h.0.mlp.fc.linear.weight', 'transformer.h.0.mlp.fc.linear.bias', 'transformer.h.0.mlp.proj.linear.weight', 'transformer.h.0.mlp.proj.linear.bias']
  warnings.warn(msg)
/home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py:174: UserWarning: transformer.h.1 has both parameters with requires_grad=True and False. We do not recommend wrapping such modules since the gradient memory usage will be higher than expected (201510912 numel instead of 131072 numel before sharding via reduce-scatter). If possible, wrap the frozen parameters with FSDP separately.

This should be looked at or silenced if we don't want to action on it

Answer 1 · 2024-05-13T22:15:03.000Z

Is changing the code so the lora parameters are in a separate module an option? I don't see how you can otherwise wrap the lora parameters into a separate FSDP unit. I might be able to help.

Answer 2 · 2024-07-18T13:25:55.000Z

Still occuring.