nicosouth opened this issue a year ago · 1 comments
hello!
when i finetune the llama2-13b model with 2 nodes(16 gpus), i find that the checkpoint is abnormal.
The specific performance is that model size is wrong. i find that the save model size is double the model size.
it have solved.