replicate/replicate-python

CUDA kernel error

vianseto opened this issue · 3 comments

I attempted to perform inference in 3 steps, namely from model A -> model B -> model C. When reaching model C, an error occurs:

replicate.exceptions.ModelError: CUDA error: invalid configuration argument CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging, consider passing CUDA_LAUNCH_BLOCKING=1. Compile with 'TORCH_USE_CUDA_DSA' to enable device-side assertions.

help me pls

mattt commented

Hi, @vianseto. Could you please be more specific about what you're doing? How exactly are you chaining inference across models? Which models are you using? Does this happen all the time or just some of the time? Please share any code that might be helpful to debug the issue.

@vianseto Set the CUDA_LAUNCH_BLOCKING environment variable to 1 before running your code. This will make CUDA operations synchronous and might provide more accurate error information.

export CUDA_LAUNCH_BLOCKING=1

Verify that your GPU has enough free memory to handle the operations. You can use tools like nvidia-smi to monitor GPU memory usage.

try to install latest versions of frameworks

@vianseto Let me know if you're still having this problem, and I'd be happy to reopen.

@pavansai26 Thanks for sharing those suggestions!