BAAI-DCAI/Bunny

Question about deepspeed checkpoint loading

Opened this issue · 1 comments

I tried to load Lora training adapters from Deepspeed checkpoint:
dir:

ls Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2/checkpoint-6000
total 696M
-rw-r--r-- 1 schwan46494@gmail.com CU   775 Nov 18 11:03 adapter_config.json
-rw-r--r-- 1 schwan46494@gmail.com CU  686M Nov 18 11:03 adapter_model.safetensors
-rw-rw-r-- 1 schwan46494@gmail.com CU  1.4K Nov 18 16:54 config.json
drwxr-xr-x 2 schwan46494@gmail.com CU  4.0K Nov 18 11:03 global_step6000
-rw-r--r-- 1 schwan46494@gmail.com CU    15 Nov 18 11:03 latest
-rw-r--r-- 1 schwan46494@gmail.com CU  5.1K Nov 18 11:03 README.md
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_0.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_1.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_2.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_3.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_4.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_5.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_6.pth
-rw-r--r-- 1 schwan46494@gmail.com CU   16K Nov 18 11:03 rng_state_7.pth
-rw-r--r-- 1 schwan46494@gmail.com CU  1.1K Nov 18 11:03 scheduler.pt
-rw-r--r-- 1 schwan46494@gmail.com CU   221 Nov 18 11:03 special_tokens_map.json
-rw-r--r-- 1 schwan46494@gmail.com CU   50K Nov 18 11:03 tokenizer_config.json
-rw-r--r-- 1 schwan46494@gmail.com CU  8.7M Nov 18 11:03 tokenizer.json
-rw-r--r-- 1 schwan46494@gmail.com CU 1023K Nov 18 11:03 trainer_state.json
-rw-r--r-- 1 schwan46494@gmail.com CU  6.5K Nov 18 11:03 training_args.bin
-rwxr--r-- 1 schwan46494@gmail.com CU   25K Nov 18 11:03 zero_to_fp32.py

instead of the usual Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2 as I want to perform error analysis of when do my model corrupt.

with this code:

# model_path is Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2/checkpoint-6000
# base_model_path is a bunny variant model.
model = AutoModelForCausalLM.from_pretrained(
    base_model_path,
    torch_dtype=torch.float16,  # float32 for cpu
    device_map='auto',
    trust_remote_code=True)
model.load_adapter(model_path)

tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True)

However, I get this warning.

Some weights of the model checkpoint at /home/11001207/chawanP/Teerapol/llama-3-typhoon-v1.5-8b-vision-preview were not used when initializing BunnyLlamaForCausalLM: ['model.vision_tower.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.head.attention.in_proj_bias', 'model.vision_tower.vision_tower.vision_model.head.attention.in_proj_weight', 'model.vision_tower.vision_tower.vision_model.head.attention.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.head.attention.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.head.layernorm.bias', 'model.vision_tower.vision_tower.vision_model.head.layernorm.weight', 'model.vision_tower.vision_tower.vision_model.head.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.head.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.head.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.head.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.head.probe']

  • This IS expected if you are initializing BunnyLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BunnyLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Loading adapter weights from /home/11001207/chawanP/pak/Bunny/checkpoints-llama3-8b/bunny-lora-llama3-8b-attempt2/checkpoint-6000 led to unexpected keys not found in the model: ['model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.lora_A.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.lora_B.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.lora_A.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.lora_B.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.lora_A.default.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.lora_B.default.weight'].

Question

  • Is the model not fusing the vision adapters?
  • How to load/convert these checkpoints? (Their schema is different, they have no non_lora_trainable.bin, config.json, and more)