Ablation on the trainability of LM at pretraining stage
Closed this issue · 6 comments
In section 4.1, We train the projector and LM backbone, and freeze the vision encoder at both stages.
Do you train the LLM in the first stage?
Yes. The trainabilities of each module are the same for both stages (as summarized in Table 2).
Thanks for pointing this out! We'll state it clearer in the paper.
Maybe it's a little unfair. The LLaVA-based models you compare did not train LLM in first stage. Have there been further ablation experiments? My concern is training parameters rather than dataset validity.
Hi @LinB203 ,
Thanks for your question. This is indeed a very necessary ablation that we've missed.
-
Models we should compare with
Among the three 3B-scale models we compared in our paper, TinyGPT-V adopts a 4-stage training pipeline, while MobileVLM and LLaVA-Phi both adopt LLaVA-like training pipelines. Hence, to address your concern, we only compare the LLaVA-like models.
-
Experiment Setting
To have a fair comparison, we only train the projector at PT and adjust the peak LR to 1e-3 at PT. The other settings follow the settings of ALLaVA-Longer. The resulted model is named ALLaVA-only_proj@PT.
-
Validity of our ALLaVA-4V dataset
The validity of our dataset is manifested in two fashions. Results are detailed in the table below.
-
Comparison with MobileVLM and LLaVA-Phi
ALLaVA-only_proj@PT performs significantly better than MobileVLM and LLaVA-Phi, but in general performs slightly worse than ALLaVA-Longer due to the decrease of trainable parameters at PT.
-
Comparison with models with larger scale
ALLaVA-only_proj@PT also performs on par with some 7B models on several benchmarks.
-
Model | MMB | SEED (v1, img) | MM-Vet | MME | TextVQA | GQA |
---|---|---|---|---|---|---|
Qwen-VL-Chat | 60.6 | 65.4 | - | 1487.5 | 61.5 | 57.5 |
LLaVA-v1.5-7B | 64.3 | - | 31.1 | 1510.7 | 58.2 | 62.0 |
MobileVLM | 59.6 | - | - | 1288.9 | 47.5 | - |
LLaVA-Phi | 59.8 | - | 28.9 | 1335.1 | 48.6 | - |
ALLaVA-Longer | 64.6 | 65.6 | 35.5 | 1564.6 | 50.3 | 50.0 |
ALLaVA-only_proj@PT | 65.3 | 64.4 | 33.7 | 1557.1 | 50.3 | 49.1 |
Hope this ablation addresses your concern!
Best,
Guiming
Great work! I will take your data!
Thanks! We are releasing the LAION images to hf in a few days. Stay tuned!
We have updated the laion images as well as the download scripts.