
What hardware configuration is required for training?

Opened this issue · 1 comments

Great work!

I would like to consult with you about the specific details of the training process, including the type of GPU used (e.g., 3090, V100, etc.), the number of GPUs, and the duration of the training in days. Could you provide this information?


Thanks for interest in our work!

Each training step takes two~three days with A6000x6EA on one batch. The reason we use only one batch is from that gathering image, language parts has technical issue, therefore inference code still has this issue, where the code should run with one batch theoretically.