ShihaoZhaoZSH/LaVi-Bridge

Training and Fine-tuning hardware requirements

kubernetes-bad opened this issue · 1 comments

Exciting paper! Thank you for doing this research and publishing it.

Do you want to share some insight on what type of compute is required for training LaVi-Bridge?

Since you've used around 2M text-image pairs to train this, it sounds like you'd need a cluster of GPUs to train this from scratch (please correct me if I'm wrong!). Is finetuning the adapter and LoRAs something that can be performed on a smaller, domain-specific dataset? I would be curious to know what kind of compute that would require.

Thanks!

Thank you for your interest in our LaVi-Bridge! As mentioned in our research paper, we utilize 8 A100 GPUs and train on around 1 million text-image pairs for less than 2 days. The batch size is set to 256. Therefore, by simply reducing the batch size or employing strategies such as mixed precision training or gradient accumulation, it is possible to train the model on fewer computational resources. Additionally, we provide results showcasing the performance as training steps progress in the appendix of our research paper, which you can refer to for further information.