NVIDIA/Megatron-LM

[QUESTION] There is already a 32-bit model parameter in the optimizer state. Why do we need to store a separate copy of the model parameters in the checkpoint?

Opened this issue · 0 comments

There is already a 32-bit model parameter in the optimizer state. Why do we need to store a separate copy of the model parameters in the checkpoint?