Ablation on the trainability of LM at pretraining stage

Question

Ablation on the trainability of LM at pretraining stage

Closed this issue 9 months ago · 6 comments

In section 4.1, We train the projector and LM backbone, and freeze the vision encoder at both stages.
Do you train the LLM in the first stage?

Answer 1 · 2024-02-23T12:29:29.000Z

Yes. The trainabilities of each module are the same for both stages (as summarized in Table 2).

Thanks for pointing this out! We'll state it clearer in the paper.

Answer 2 · 2024-02-23T12:34:00.000Z

Maybe it's a little unfair. The LLaVA-based models you compare did not train LLM in first stage. Have there been further ablation experiments? My concern is training parameters rather than dataset validity.

Answer 3 · 2024-02-26T08:59:11.000Z

Hi @LinB203 ,

Thanks for your question. This is indeed a very necessary ablation that we've missed.

Models we should compare with

Among the three 3B-scale models we compared in our paper, TinyGPT-V adopts a 4-stage training pipeline, while MobileVLM and LLaVA-Phi both adopt LLaVA-like training pipelines. Hence, to address your concern, we only compare the LLaVA-like models.
Experiment Setting

To have a fair comparison, we only train the projector at PT and adjust the peak LR to 1e-3 at PT. The other settings follow the settings of ALLaVA-Longer. The resulted model is named ALLaVA-only_proj@PT.
Validity of our ALLaVA-4V dataset

The validity of our dataset is manifested in two fashions. Results are detailed in the table below.
- Comparison with MobileVLM and LLaVA-Phi
  
  ALLaVA-only_proj@PT performs significantly better than MobileVLM and LLaVA-Phi, but in general performs slightly worse than ALLaVA-Longer due to the decrease of trainable parameters at PT.
- Comparison with models with larger scale
  
  ALLaVA-only_proj@PT also performs on par with some 7B models on several benchmarks.

Model	MMB	SEED (v1, img)	MM-Vet	MME	TextVQA	GQA
Qwen-VL-Chat	60.6	65.4	-	1487.5	61.5	57.5
LLaVA-v1.5-7B	64.3	-	31.1	1510.7	58.2	62.0
MobileVLM	59.6	-	-	1288.9	47.5	-
LLaVA-Phi	59.8	-	28.9	1335.1	48.6	-
ALLaVA-Longer	64.6	65.6	35.5	1564.6	50.3	50.0
ALLaVA-only_proj@PT	65.3	64.4	33.7	1557.1	50.3	49.1

Hope this ablation addresses your concern!

Best,
Guiming

Answer 4 · 2024-02-26T09:01:53.000Z

Great work! I will take your data!

Answer 5 · 2024-02-26T09:16:49.000Z

Thanks! We are releasing the LAION images to hf in a few days. Stay tuned!

Answer 6 · 2024-02-29T17:31:38.000Z

We have updated the laion images as well as the download scripts.