Ttl/ComfyUi_NNLatentUpscale

Model loader always defaults to CUDA regardless of what backend is being used. Breaks systems without NVidia cards.

NeedsMoar opened this issue · 3 comments

I'm currently running the directml backend on Windows and got it working by removing explicit CUDA usage anywhere it was, and explicitly using CPU for the neural network; torch didn't want to load the model on a directml device (which shows up as "privateuse0" type) for some reason. I probably messed something up there; normally both will just work until somebody comes along with a vendor specific extension Might want to look into that. Doesn't seem to be an issue converting it to the device type though since I was able to do that at the return from the upscale function without issues (see below), it was just the call to torch_load in resizer. Normally when loading is done through model_management.load_model() there aren't any issues. In my case it complained that privateuse0 wasn't in the list of device types available.

I should note that the first time I tried doing this I also had converted the return of the function in your checkin to use CPU as you did. I was sending it to a second KSampler node (theoretically for detail but also to correct shape issues with QR Code). When I did things like that it ran KSampler on CPU. I don't know enough about the flow of data through comfy to know for sure but I'm guessing since most of the samplers / schedulers either don't specify a device (or always run on the CPU in the case of most schedulers) they get initialized to try to use the GPU when they're constructed but will run them wherever they are otherwise.

Edit: Looks like the above got cut off when I first posted for some reason, but that last bit is probably relevant to how the checkin will work on non-CUDA devices.

Ttl commented

I'm not sure what the exact issue is. I changed the torch.load to the same method that ComfyUI uses so if that works with DirectML it should work here too.

Ooops left that at the same time. I'm guessing that will work for me (and it'll keep ipex optimizers from being bypassed on Intel for people with those cards). Check that last part about samplers running on CPU if the results are output to that device though, that's another bit of weirdness.

Just checking in to let you know that all is working as expected. Now it's off to complain to microsoft that torch.load can't load a model on a directml device identified by its own returned value when torch-directml is installed. I think part of the issue is that they import the DirectML.dll library into python and call a bunch of native_ functions within it but when I popped open the library none of them are exported, nor are they documented anywhere

Works good for me so far. :D