Docker GPU Issues

Question

Docker GPU Issues

bgiffo96 opened this issue 3 months ago · 7 comments

After setting up Grounded-SAM-2 using the docker container provided and running the demo script:

/home/appuser/Grounded-SAM-2# python grounded_sam2_local_demo.py

I am met with the following warning and error:

UserWarning: Failed to load custom C++ ops. Running on CPU mode Only!
UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1716905979055/work/aten/src/ATen/native/TensorShape.cpp:3587.)
final text_encoder_type: bert-base-uncased
model.safetensors: 100%|██████████| 440M/440M [00:05<00:00, 56.5MB/s]
UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1716905979055/work/torch/csrc/utils/tensor_numpy.cpp:206.)
UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
UserWarning: None of the inputs have requires_grad=True. Gradients will be None
Traceback (most recent call last):
File "/home/appuser/Grounded-SAM-2/grounded_sam2_local_demo.py", line 58, in
boxes, confidences, labels = predict(
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/util/inference.py", line 68, in predict
outputs = model(image[None], captions=[caption])
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/groundingdino.py", line 327, in forward
hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/transformer.py", line 258, in forward
memory, memory_text = self.encoder(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/transformer.py", line 576, in forward
output = checkpoint.checkpoint(
File "/opt/conda/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 262, in forward
outputs = run_function(*args)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/transformer.py", line 785, in forward
src2 = self.self_attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 338, in forward
output = MultiScaleDeformableAttnFunction.apply(
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 53, in forward
output = _C.ms_deform_attn_forward(
NameError: name '_C' is not defined

The container appears to build with the GPU correctly. Below are the results of test I have ran to find discrepancies:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

TORCH_VERSION = ".".join(torch.version.split(".")[:2])
CUDA_VERSION = torch.version.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)
torch: 2.3 ; cuda: 2.3.1

print(torch.cuda.is_available())
True

print(torch.cuda.device_count())
1

print(torch.cuda.current_device())
0

print(torch.cuda.device(0))
<torch.cuda.device object at 0x7a10b3552080>

print(torch.cuda.get_device_name(0))
NVIDIA GeForce RTX 4090

I am running the host machine with Ubuntu 24.04 with CUDA 12.6 if that is of any relevance.

I cannot for the life of me figure out what is causing the issue is or the solution to it. Any support on this issue would be greatly appreciated.

Answer 1 · 2024-10-09T07:30:50.000Z

I have tested the dockerfile on another device and recreated the error following the same set up procedure described in the README.

/home/appuser/Grounded-SAM-2# python grounded_sam2_local_demo.py
UserWarning: Flash Attention is disabled as it requires a GPU with Ampere (8.0) CUDA capability.
UserWarning: Failed to load custom C++ ops. Running on CPU mode Only!
UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1716905979055/work/aten/src/ATen/native/TensorShape.cpp:3587.)
final text_encoder_type: bert-base-uncased
UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1716905979055/work/torch/csrc/utils/tensor_numpy.cpp:206.)
UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
UserWarning: None of the inputs have requires_grad=True. Gradients will be None
Traceback (most recent call last):
File "/home/appuser/Grounded-SAM-2/grounded_sam2_local_demo.py", line 58, in
boxes, confidences, labels = predict(
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/util/inference.py", line 68, in predict
outputs = model(image[None], captions=[caption])
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/groundingdino.py", line 327, in forward
hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/transformer.py", line 258, in forward
memory, memory_text = self.encoder(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/transformer.py", line 576, in forward
output = checkpoint.checkpoint(
File "/opt/conda/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 262, in forward
outputs = run_function(*args)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/transformer.py", line 785, in forward
src2 = self.self_attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 338, in forward
output = MultiScaleDeformableAttnFunction.apply(
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/appuser/Grounded-SAM-2/grounding_dino/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 53, in forward
output = _C.ms_deform_attn_forward(
NameError: name '_C' is not defined

TORCH_VERSION = ".".join(torch.version.split(".")[:2])
CUDA_VERSION = torch.version.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)
torch: 2.3 ; cuda: 2.3.1

print(torch.cuda.is_available())
True
print(torch.cuda.device_count())
1
print(torch.cuda.current_device())
0
print(torch.cuda.device(0))
<torch.cuda.device object at 0x735d5cb464d0>
print(torch.cuda.get_device_name(0))
NVIDIA GeForce RTX 2070 SUPER

Again, I am running the host machine with Ubuntu 24.04 with CUDA 12.2 if that is of any relevance.

This issue seems to be known as the Makefile references a similar issue (IDEA-Research/Grounded-Segment-Anything#84) however the references suggests that the problem has been addressed.

Additionally I have tried and tested setting up Grounded-Segment-Anything and receive the same error, again on both devices. I was initially trying to set up Grounded-Segment-Anything and then tried setting up Grounded-SAM-2 as i thought the issue could have came from the version of cuda being used.

Answer 2 · 2024-10-16T10:43:58.000Z

Hi there,

This error typically occurs due to incorrect compilation of custom CUDA/C++ extensions or incompatibilities between PyTorch versions, CUDA, and custom operations. I can confirm that I am using Ubuntu 20.04, and the Docker container works correctly in that environment. If you are running Ubuntu 24.04, I recommend updating the Dockerfile to ensure compatibility with your system.

Best regards!

Answer 3 · 2024-10-18T05:49:40.000Z

Hi! I'm using Ubuntu 20.04 with CUDA version 12.1, and I'm facing the same issue. I read this and they suggested setting CUDA_HOME which I also did. I was wondering if there is any updates on this? thanks!

Answer 4 · 2024-10-28T15:08:31.000Z

Hello!

It seems you’re using the GroundingDINO model locally instead of through Hugging Face. I recommend trying the Hugging Face version for easier setup and compatibility. Alternatively, if you prefer running it locally, please follow the detailed instructions for building and setting up the model provided here: Grounded Segment Anything - GitHub.

Let me know if you need further assistance!

Answer 5 · 2024-11-08T05:44:26.000Z

To share some of my observations, I have faced the same error NameError: name '_C' is not defined when running the grounded_sam2_local_demo.py file. However, I can run grounded_sam2_hf_model_demo.py successfully without any errors.

Also the Ubuntu OS version does not seem to matter, I have experimented with both 20.04 and 22.04 and I have the same observations as above.

Answer 6 · 2024-11-12T21:44:46.000Z

I also experienced a similar issue while running grounded_sam2_local_demo.py within the docker container. My digging seemed to confirm the issue was related to the grounding_dino install.

For whatever reason, when I repeated: /home/appuser/Grounded-SAM-2# python -m pip install --no-build-isolation -e grounding_dino in the command line and within the docker container the issue resolved itself.

Answer 7 · 2024-11-14T10:53:51.000Z

I also experienced a similar issue while running grounded_sam2_local_demo.py within the docker container. My digging seemed to confirm the issue was related to the grounding_dino install.

For whatever reason, when I repeated: /home/appuser/Grounded-SAM-2# python -m pip install --no-build-isolation -e grounding_dino in the command line and within the docker container the issue resolved itself.

@amerk12 - Thank you very much for your reply, after testing this on 2 different machines and 3 Operating systems this solution works consistently.

I had previously seen other suggestions to do this and I thought I had testing it to no avail. Clearly I had broken other parts of the system in trying to solve the issue.

@ShuoShenDe - I might be worth adding a remark to the readme adding this as a suggestion to fix this common issue.