abnormal checkpoint

Question

nicosouth opened this issue a year ago · 1 comments

hello!

when i finetune the llama2-13b model with 2 nodes(16 gpus), i find that the checkpoint is abnormal.

The specific performance is that model size is wrong. i find that the save model size is double the model size.

Answer 1 · 2023-08-23T01:33:49.000Z

it have solved.