Merge Issue ? My config.json has "Architecture" as "LlamaForCausalLM" instead of "LlavaLlamaForCausalLM" in the final merged and adapter models created

Question

Merge Issue ? My config.json has "Architecture" as "LlamaForCausalLM" instead of "LlavaLlamaForCausalLM" in the final merged and adapter models created

Closed this issue a month ago · 0 comments

SrikanthChellappa commented 3 months ago

The pre-train "weizhiwang/llava-v1.5-llama-3-8b-pretrain-clip-large-336px" architecture seems to be "LlamaForCausalLM" and the finetuned model "weizhiwang/LLaVA-Llama-3-8B" has architecture as "LlavaLlamaForCausalLM" in config.json. When i tried to fine-tune with the above pre-train model as given in this repository i am getting LORA adapters and the merged model architecture as "LlamaForCausalLM" and not as "LlavaLlamaForCausalLM". What mistake i am doing here during finetuning ?

I am doing LORA finetuning and deepspeed instruction is as below
deepspeed --num_gpus=1 /home/srikanth/api-webapp/LLaVA-Llama-3/llava/train/train_mem.py
--lora_enable True --lora_r 16 --lora_alpha 32 --mm_projector_lr 2e-5
--model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct
--deepspeed /home/srikanth/api-webapp/LLaVA-Llama-3/scripts/zero3.json
--version v3
--data_path /mnt/e/Vision-Finetuning/data/llava_instruct_80k.json
--image_folder /mnt/e/Vision-Finetuning/data/images/
--vision_tower openai/clip-vit-large-patch14-336
--pretrain_mm_mlp_adapter /home/srikanth/api-webapp/checkpoints/llava-llama-8B/llava-v1.5-llama-3-8b-pretrain/mm_projector.bin
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--image_aspect_ratio pad
--group_by_modality_length True
--bf16 True
--output_dir /home/srikanth/api-webapp/checkpoints/llava-llama-8B
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 4
--gradient_accumulation_steps 1
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 200
--max_steps 100
--save_total_limit 1
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03 \

My merge script is as below
python /home/srikanth/api-webapp/LLaVA-Llama-3/scripts/merge_lora_weights.py --model-path /home/srikanth/api-webapp/checkpoints/llava-llama-8B --model-base meta-llama/Meta-Llama-3-8B-Instruct --save-model-path /home/srikanth/api-webapp/multimodal-llava-llama-8B