FreedomIntelligence/ALLaVA

Ablation on the trainability of LM at pretraining stage

Closed this issue · 6 comments

In section 4.1, We train the projector and LM backbone, and freeze the vision encoder at both stages.
Do you train the LLM in the first stage?

Yes. The trainabilities of each module are the same for both stages (as summarized in Table 2).

Thanks for pointing this out! We'll state it clearer in the paper.

Maybe it's a little unfair. The LLaVA-based models you compare did not train LLM in first stage. Have there been further ablation experiments? My concern is training parameters rather than dataset validity.

Hi @LinB203 ,

Thanks for your question. This is indeed a very necessary ablation that we've missed.

  • Models we should compare with

    Among the three 3B-scale models we compared in our paper, TinyGPT-V adopts a 4-stage training pipeline, while MobileVLM and LLaVA-Phi both adopt LLaVA-like training pipelines. Hence, to address your concern, we only compare the LLaVA-like models.

  • Experiment Setting

    To have a fair comparison, we only train the projector at PT and adjust the peak LR to 1e-3 at PT. The other settings follow the settings of ALLaVA-Longer. The resulted model is named ALLaVA-only_proj@PT.

  • Validity of our ALLaVA-4V dataset

    The validity of our dataset is manifested in two fashions. Results are detailed in the table below.

    • Comparison with MobileVLM and LLaVA-Phi

      ALLaVA-only_proj@PT performs significantly better than MobileVLM and LLaVA-Phi, but in general performs slightly worse than ALLaVA-Longer due to the decrease of trainable parameters at PT.

    • Comparison with models with larger scale

      ALLaVA-only_proj@PT also performs on par with some 7B models on several benchmarks.

Model MMB SEED (v1, img) MM-Vet MME TextVQA GQA
Qwen-VL-Chat 60.6 65.4 - 1487.5 61.5 57.5
LLaVA-v1.5-7B 64.3 - 31.1 1510.7 58.2 62.0
MobileVLM 59.6 - - 1288.9 47.5 -
LLaVA-Phi 59.8 - 28.9 1335.1 48.6 -
ALLaVA-Longer 64.6 65.6 35.5 1564.6 50.3 50.0
ALLaVA-only_proj@PT 65.3 64.4 33.7 1557.1 50.3 49.1

Hope this ablation addresses your concern!

Best,
Guiming

Great work! I will take your data!

Thanks! We are releasing the LAION images to hf in a few days. Stay tuned!

We have updated the laion images as well as the download scripts.