NVIDIA/Megatron-LM

[QUESTION] I encountered the following issue when executing your command. What could be the cause? args.exit_on_missing_checkpoint is: True >> '--exit-on-missing-checkpoint' set ... exiting. <<

Opened this issue · 1 comments

I think you need to add saver and loader. Can you try this once?

python tools/checkpoint/convert.py --model-type GPT --loader mcore --saver mcore --megatron-path $megatron_folder --load-dir $load_dir --save-dir $save_dir

Also, when you train the model, you can specify --ckpt-forma as torch to save in torch format.

Originally posted by @argitrage in #1291

Can you check if you are using the correct $load_dir which contains the checkpoint. I encountered this error when I was passing a the wrong folder with no checkpoint inside.