threestudio-project/threestudio

[Zero123] GPU not fully used and training stucks

csyhping opened this issue · 3 comments

Hi @bennyguo , thanks for your excellent work. I tried to test Zero123/Stable Zero123, but the training is always stuck. I've tried every related issue; I tried to run on one 3090 or two 3090, I tried different images in the example folder, etc., but the training is still stuck, and the GPU usage is really low (~5000MiB / 24GiB, if I use 2 GPUS then ~5000MiB/24GiB each).

I saw some other people also have such an issue; do you have any idea about this? Thanks!!

I tried muli-view only example, it can work, but the original 3D version still failed.

Finally solved...

For me, it is the nerfacc version issue; the code is stuck at nerfacc.estimator.sampling .Re-install pip install nerfacc==0.5.2 solved my problem. stuck at nerfacc utility #23

For anyone with a similar issue, try to use debug tools like ipdb to check where the stuck occurs; sometimes, it may not be the GPU's issue (which seems like one).

I hope this helps.