torch_fsdp_example: A Python repository from Edenzzzz

A minial usage of torch FSDP(Fully shared data parrallel https://pytorch.org/docs/stable/fsdp.html, which implements ZeRO3 https://arxiv.org/pdf/1910.02054.pdf)

Requires 4 GPUs but you can tune that to the number you have. Shows linear memory reduction regarding the number of GPUs compared to default torch DDP. Also shows why you should pass in cache_enabled=False in torch.autocastwhen there's only one forward pass.

Edenzzzz/torch_fsdp_example

A minial usage of torch FSDP(Fully shared data parrallel https://pytorch.org/docs/stable/fsdp.html, which implements ZeRO3 https://arxiv.org/pdf/1910.02054.pdf)