OptimalScale/LMFlow

abnormal checkpoint

nicosouth opened this issue · 1 comments

hello!

when i finetune the llama2-13b model with 2 nodes(16 gpus), i find that the checkpoint is abnormal.

The specific performance is that model size is wrong. i find that the save model size is double the model size.

it have solved.