crockwell/Cap3D

GPU requirements

Closed this issue · 5 comments

I greatly appreciate the Cap3D project and am thankful for making it publicly available. It's an invaluable resource for the community.

Could you please advise on the GPU requirements for training the Cap3D model? Additionally, is it feasible to train it using Google Colab Pro's GPU resources? Any guidance on configurations or adjustments needed would be greatly appreciated.

Thanks!

Hi Haider,

Thanks for the kind words! So to be clear, the captioning in Cap3D was inference only -- we use pretrained BLIP2, CLIP and GPT4 models. If I recall BLIP2 required at least 20GB of memory to run inference. We also used an A40 (48GB) to train the text-to-3D models (e.g. PointE, ShapE). I don't personally have experience with Colab Pro but I'd be suspicious it may only give 12GB memory. You may want to do some Googling to see if there is a config with enough memory, or run a few setups and see what fits.

Hope that helps!
Chris

Hu Crockwell,

Thanks for the quick update, I have checked google colab pro and it provides an A100 gpu which has around 40gb memory. If you guys have used 48gb then I think 40gb would be sufficient as well. Besides, I have tried to train it with my 1070 8gb gpu as well with the batch size of 2 and it started working tho the process was too slow and utilised 100 percent of my gpu memory.

Thanks for the information! I'll close this issue, feel free to open if you have further questions.

Hi @crockwell ,

I have a query, and will really appreciate if you can help with it asap.

I started training using Colab pro with L4 22gb gpu and batch size of 8, I generated around 10,000 obj files using blender and later converted them to .pt objects.

I trained them for 25 epochs, but the result is not even close to what i expected. The generated model after using trained inference are completely distorted and I think it's due to the conversion of obj to pt.

Can you guys please share any insight about this conversion of obj to pt specifically for shapE.

Thanks!

Hi Haider,

As noted in our experiments, finetuning ShapE did not bring a meaningfuly benefit, so it may be just as well to use the pretrained checkpoint and not worry about finetuning. Hopefully that will give you a more expected result? Feel free to open a new issue with full details if you continue to struggle.

Best,
Chris