gordicaleksa/Open-NLLB

Reduce peak memory when using FSDP on 2+ GPUs

gordicaleksa opened this issue · 0 comments

Figure out the peak memory issue with FSDP when running the 615 M parameter on 2 GPUs I linked here:

facebookresearch/fairseq#5318