SysCV/MaskFreeVIS

scripts/visual_video.sh fails

Opened this issue · 3 comments

With my current configuration that follows requirements, bash scripts/visual_video.sh fails with

ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'
Please compile MultiScaleDeformableAttention CUDA op with the following commands:
        `cd mask2former/modeling/pixel_decoder/ops`
        `sh make.sh`

despite running make.sh produces

Installed /usr/lib/python3.8/site-packages/MultiScaleDeformableAttention-1.0-py3.8-linux-x86_64.egg
Processing dependencies for MultiScaleDeformableAttention==1.0
Finished processing dependencies for MultiScaleDeformableAttention==1.0

At the same time, inference with detectron2 python3 detectron2/demo/demo.py works as expected.

Having a reproducible configuration would hopefully eliminate the scripts/visual_video.sh failure, having a Dockerfile would be ideal.

lkeab commented

what is your cuda version and gpu types?

@lkeab, the attached Dockerfile helped to eliminate the problem.

There are two minor issues with demo_video/demo.py,

  1. modify line 162 to add a missing parameter,
    predictions, visualized_output = demo.run_on_video(vid_frames, args.confidence_threshold)
  2. modify line 140 by replacing fps=5 with duration = 200 to align with imageio==2.31.1,

and one major issue. demo_video/demo.py with input as images seems to run OK for up to 7 images in a folder; for larger number of images it fails with a CUDA error; needless to say demo_video/demo.py with input as video fails with a similar error, e.g.

[06/23 13:07:08 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /dataout/MaskFreeVis/mfvis_models/model_final_swinl_0560.pth ...
[06/23 13:07:08 fvcore.common.checkpoint]: [Checkpointer] Loading from /dataout/MaskFreeVis/mfvis_models/model_final_swinl_0560.pth ...
Traceback (most recent call last):
  File "demo_video/demo.py", line 162, in <module>
    predictions, visualized_output = demo.run_on_video(vid_frames, args.confidence_threshold)
  File "/MaskFreeVIS/demo_video/predictor.py", line 46, in run_on_video
    predictions = self.predictor(frames)
  File "/MaskFreeVIS/demo_video/predictor.py", line 111, in __call__
    predictions = self.model([inputs])
  File "/opt/conda/envs/maskfreevis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/MaskFreeVIS/demo_video/../mask2former_video/video_maskformer_model.py", line 291, in forward
    features = self.backbone(images.tensor)
  File "/opt/conda/envs/maskfreevis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/MaskFreeVIS/demo_video/../mask2former/modeling/backbone/swin.py", line 753, in forward
    y = super().forward(x)
  File "/MaskFreeVIS/demo_video/../mask2former/modeling/backbone/swin.py", line 672, in forward
    x_out = norm_layer(x_out)
  File "/opt/conda/envs/maskfreevis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/maskfreevis/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward
    return F.layer_norm(
  File "/opt/conda/envs/maskfreevis/lib/python3.8/site-packages/torch/nn/functional.py", line 2347, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

My environment:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

nvidia-smi
Mon Jun 26 00:24:28 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.89       Driver Version: 513.63       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 3000     On   | 00000000:01:00.0  On |                  N/A |
| N/A   59C    P8    14W /  N/A |    765MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

@lkeab, with a modified Dockerfile that includes CUDA 11.3 (not 11.1 as previously), and running CUDA kernels synchronously having CUDA_LAUNCH_BLOCKING=1, demo_video/demo.py triggers either
File "/MaskFreeVIS/demo_video/../mask2former/modeling/backbone/swin.py", line 159, in forward attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0) RuntimeError: CUDA error: out of memory
or
File "/MaskFreeVIS/demo_video/../mask2former/modeling/backbone/swin.py", line 159, in forward attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0) RuntimeError: CUDA error: an illegal memory access was encountered
error. Do you think the problem is in having only 6 GB of dedicated GPU memory?