lzzcd001/MeshDiffusion

Asking for error solutions

Closed this issue · 2 comments

Hello, I want to learn something through this project, I successfully configured the running environment. My environment is as follows:
CUDA: 11.3
torch: 1.12.0+cu113
touchaudio: 0.12.0+cu113
touchvision: 0.13.0+cu113

When I finished the data set, when I executed the training statement, it could not complete it, and its error message was as follows:

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I have searched for many ways, but none of them have been solved, do you know how to solve it?

Thanks in advance.

Do you mean you successfully created a dataset with the DMTet fitting part but encountered the issue in the diffusion model training part?

To me it seems more like a general issue with your CUDA and pytorch installation. If the same error message appears else where (e.g., you see it when running the pytorch tutorial codes), I would suggest give it a try by reinstalling everything and debugging with tutorial codes instead, if you have not done it.

In addition, I would recommend adding CUDA_LAUNCH_BLOCKING=1 before python ... as suggested by the error message you showed to know the true source of the error.

Thank you for your answer, I will continue to try to solve it.
I have a problem when running this statement:

cd nvdiffrec
python fit_dmtets.py --config configs\res64.json --meta-path data\model.json --out-dir output --index 0 --split-size 100000

Hope to communicate with you about the follow-up results, thanks a lot.