Question about deepspeed checkpoint loading
Opened this issue · 1 comments
I tried to load Lora training adapters from Deepspeed checkpoint:
dir:
ls Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2/checkpoint-6000
total 696M
-rw-r--r-- 1 schwan46494@gmail.com CU 775 Nov 18 11:03 adapter_config.json
-rw-r--r-- 1 schwan46494@gmail.com CU 686M Nov 18 11:03 adapter_model.safetensors
-rw-rw-r-- 1 schwan46494@gmail.com CU 1.4K Nov 18 16:54 config.json
drwxr-xr-x 2 schwan46494@gmail.com CU 4.0K Nov 18 11:03 global_step6000
-rw-r--r-- 1 schwan46494@gmail.com CU 15 Nov 18 11:03 latest
-rw-r--r-- 1 schwan46494@gmail.com CU 5.1K Nov 18 11:03 README.md
-rw-r--r-- 1 schwan46494@gmail.com CU 16K Nov 18 11:03 rng_state_0.pth
-rw-r--r-- 1 schwan46494@gmail.com CU 16K Nov 18 11:03 rng_state_1.pth
-rw-r--r-- 1 schwan46494@gmail.com CU 16K Nov 18 11:03 rng_state_2.pth
-rw-r--r-- 1 schwan46494@gmail.com CU 16K Nov 18 11:03 rng_state_3.pth
-rw-r--r-- 1 schwan46494@gmail.com CU 16K Nov 18 11:03 rng_state_4.pth
-rw-r--r-- 1 schwan46494@gmail.com CU 16K Nov 18 11:03 rng_state_5.pth
-rw-r--r-- 1 schwan46494@gmail.com CU 16K Nov 18 11:03 rng_state_6.pth
-rw-r--r-- 1 schwan46494@gmail.com CU 16K Nov 18 11:03 rng_state_7.pth
-rw-r--r-- 1 schwan46494@gmail.com CU 1.1K Nov 18 11:03 scheduler.pt
-rw-r--r-- 1 schwan46494@gmail.com CU 221 Nov 18 11:03 special_tokens_map.json
-rw-r--r-- 1 schwan46494@gmail.com CU 50K Nov 18 11:03 tokenizer_config.json
-rw-r--r-- 1 schwan46494@gmail.com CU 8.7M Nov 18 11:03 tokenizer.json
-rw-r--r-- 1 schwan46494@gmail.com CU 1023K Nov 18 11:03 trainer_state.json
-rw-r--r-- 1 schwan46494@gmail.com CU 6.5K Nov 18 11:03 training_args.bin
-rwxr--r-- 1 schwan46494@gmail.com CU 25K Nov 18 11:03 zero_to_fp32.py
instead of the usual Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2 as I want to perform error analysis of when do my model corrupt.
with this code:
# model_path is Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2/checkpoint-6000
# base_model_path is a bunny variant model.
model = AutoModelForCausalLM.from_pretrained(
base_model_path,
torch_dtype=torch.float16, # float32 for cpu
device_map='auto',
trust_remote_code=True)
model.load_adapter(model_path)
tokenizer = AutoTokenizer.from_pretrained(
model_path,
trust_remote_code=True)
However, I get this warning.
Some weights of the model checkpoint at /home/11001207/chawanP/Teerapol/llama-3-typhoon-v1.5-8b-vision-preview were not used when initializing BunnyLlamaForCausalLM: ['model.vision_tower.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.head.attention.in_proj_bias', 'model.vision_tower.vision_tower.vision_model.head.attention.in_proj_weight', 'model.vision_tower.vision_tower.vision_model.head.attention.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.head.attention.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.head.layernorm.bias', 'model.vision_tower.vision_tower.vision_model.head.layernorm.weight', 'model.vision_tower.vision_tower.vision_model.head.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.head.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.head.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.head.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.head.probe']
- This IS expected if you are initializing BunnyLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BunnyLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Loading adapter weights from /home/11001207/chawanP/pak/Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2/checkpoint-6000 led to unexpected keys not found in the model: ['model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.lora_A.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.lora_B.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.lora_A.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.lora_B.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.lora_A.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.lora_B.default.weight'].
Question
- Is the model not fusing the vision adapters?
- How to load/convert these checkpoints? (Their schema is different, they have no non_lora_trainable.bin, config.json, and more)