IDEA-Research/Grounded-SAM-2

ms_deform_attn_forward_cuda" not implemented for 'BFloat16

Opened this issue ยท 7 comments

Hello!

This is the problem when I use grounded_sam2_local_demo.py for image inference

Hi @ChinChyi

This is caused by Deformable Attention operator, which did not support BFloat16 inference, we will fix this bug later

Hi @ChinChyi

Have you changed any code in your local env, we have fix this bug in our original implementation here:

torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()

By removing

# FIXME: figure how does this influence the G-DINO model
torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()

if torch.cuda.get_device_properties(0).major >= 8:
    # turn on tfloat32 for Ampere GPUs (https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices)
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True

After running grounding dino

So, what is the solution?I have also encounter this problem.Thank you very much!

This error happened when I called self.sam2_predictor.predicttwice

This error happened when I called self.sam2_predictor.predicttwice

Would you like to share your code with us which may be more convenient for us to debug this issue.

Hi @ChinChyi

Have you changed any code in your local env, we have fix this bug in our original implementation here:

torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()

By removing

# FIXME: figure how does this influence the G-DINO model
torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()

if torch.cuda.get_device_properties(0).major >= 8:
    # turn on tfloat32 for Ampere GPUs (https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices)
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True

After running grounding dino

Thanks

@ChinChyi changing bloat16 to float16 (torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__() to torch.autocast(device_type="cuda", dtype=torch.float16).__enter__() ) helped for me.