integer division or modulo by zero

Question

integer division or modulo by zero

forensicmike opened this issue 2 years ago · 2 comments

Using Anaconda on windows, I followed the steps in the Setup section all successful.
Downloaded all 4 files from https://huggingface.co/shi-labs/versatile-diffusion/tree/main/pretrained_pth into the ./pretrained folder.

When I run Command:
(versatile-diffusion) C:\Users\mike\Desktop\Versatile-Diffusion>python inference.py --gpu 0 --app image-variation --image ..\invokeai\inputs\00003.png --seed 8 --save log\test.png --coloradj simple

I get:

(versatile-diffusion) C:\Users\mike\Desktop\Versatile-Diffusion>python inference.py --gpu 0 --app image-variation --image ..\invokeai\inputs\00003.png --seed 8 --save log\test.png --coloradj simple
Traceback (most recent call last):
  File "inference.py", line 565, in <module>
    vd_wrapper = vd_inference(pth=pth, fp16=args.fp16, device=device)
  File "inference.py", line 35, in __init__
    net = get_model()(cfgm)
  File "C:\Users\mike\Desktop\Versatile-Diffusion\lib\model_zoo\common\get_model.py", line 87, in __call__
    net = self.model[t](**args)
  File "C:\Users\mike\Desktop\Versatile-Diffusion\lib\model_zoo\vd.py", line 220, in __init__
    super().__init__(*args, **kwargs)
  File "C:\Users\mike\Desktop\Versatile-Diffusion\lib\model_zoo\sd.py", line 55, in __init__
    highlight_print("Running in {} mode".format(self.parameterization))
  File "C:\Users\mike\Desktop\Versatile-Diffusion\lib\model_zoo\sd.py", line 21, in highlight_print
    print_log('')
  File "C:\Users\mike\Desktop\Versatile-Diffusion\lib\log_service.py", line 16, in print_log
    local_rank = sync.get_rank('local')
  File "C:\Users\mike\Desktop\Versatile-Diffusion\lib\sync.py", line 35, in get_rank
    return global_rank % local_world_size
ZeroDivisionError: integer division or modulo by zero

After reviewing line 35 in sync,py, it appears that it is dividing by the torch.cuda.device_count(). I did a little searching and it seems like it is normal/expected for this to return 0 if you have 1 GPU.

If I add in a check,

if local_world_size == 0: return 0

I am able to get past that step.

Answer 1 · 2023-01-06T13:21:21.000Z

I think this was due to having a non CUDA enabled torch version installed. Once I had this fixed, I was able to remove the revision and get it to work. Still, the error message associated to this failure wasn't ideal so it might be worth adding some checks?

Answer 2 · 2023-01-07T20:45:37.000Z

Correct, the reason is no CUDA so no GPU can be found