chengzeyi/stable-fast

Error when running on A100 Replicate

Opened this issue · 3 comments

I tried several machine on Replicate (https://replicate.com/).

My scripts work well on A40 GPU, also in Runpod it works well. But when I switch to A100 on Replicate playground, it returns the error below. Is this something related to Replicate itself or I have to change my cuda version when building docker?

Thanks for reading!


The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): graph(%input, %num_groups, %weight, %bias, %eps, %cudnn_enabled): %y = sfast_triton::group_norm_silu(%input, %num_groups, %weight, %bias, %eps) ~~~~~~~~~~~~ <--- HERE return (%y) RuntimeError: AssertionError: libcuda.so cannot found! At: /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/triton/common/build.py(30): libcuda_dirs /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/triton/common/build.py(61): _build /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/triton/compiler/make_launcher.py(39): make_stub /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/triton/compiler/compiler.py(425): compile (63): group_norm_4d_channels_last_forward_collect_stats_kernel /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/sfast/triton/init.py(35): new_func /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/triton/runtime/autotuner.py(232): run /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/triton/runtime/autotuner.py(232): run /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/sfast/triton/ops/group_norm.py(430): group_norm_forward /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/sfast/triton/torch_ops.py(193): forward /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/autograd/function.py(539): apply /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/sfast/triton/torch_ops.py(230): group_norm_silu /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/sfast/jit/trace_helper.py(133): forward /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/sfast/jit/trace_helper.py(64): wrapper /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/sfast/cuda/graphs.py(90): make_graphed_callable /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/sfast/cuda/graphs.py(61): simple_make_graphed_callable /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/sfast/cuda/graphs.py(40): dynamic_graphed_callable /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py(918): call /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/utils/_contextlib.py(115): decorate_context /src/predict.py(243): predict /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/utils/_contextlib.py(115): decorate_context /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/cog/server/worker.py(217): _predict /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/cog/server/worker.py(207): _loop /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/cog/server/worker.py(175): run /root/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/process.py(314): _bootstrap /root/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py(129): _main /root/.pyenv/versions/3.10.13/lib/python3.10/multiprocessing/spawn.py(116): spawn_main (1):

Either cuda is not installed, or CUDA vars are not present in the PATH.

@SuperSecureHuman you know how to set the cuda path the same for all machines?

@quocanh34 Please set CUDA_HOME, CUDA_PATH for the OpenAI Triton compiler to work.