huggingface/swift-coreml-diffusers

On an Intel Mac with discrete GPU, when using the GPU, the generation outputs some kind of random pattern

bdev36 opened this issue · 5 comments

Running on an Intel iMac (2020) with a discrete Radeon 5700 (8GB), the result is always something in the like of the attached screenshot.

I've cloned the repository :

  • Diffusion-macOS: the problem is identical. The GPU is doing the work but the result is random pixels.
  • Diffusion-macOS: using ComputeUnits.cpuOnly (two modifications to ControlsView.swift), the CPU is (slowly) doing the work and the result is OK.
  • Diffusion: the CPU is doing the work and the result is OK.

In all cases, no error or exception is raised.

The console output is very similar :

Generating...
Got images: [Optional(<CGImage 0x7f813e3b19c0> (IP)
	<<CGColorSpace 0x60000192dda0> (kCGColorSpaceDeviceRGB)>
		width = 512, height = 512, bpc = 8, bpp = 24, row bytes = 1536 
		kCGImageAlphaNone | 0 (default byte order)  | kCGImagePixelFormatPacked 
		is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes)] in 17.003490924835205

Diffusion also outputs this, for each step :

2023-04-22 17:30:02.375841+0200 Diffusion[7894:267125] [API] cannot add handler to 3 from 3 - dropping

Screenshot 2023-04-22 at 17 02 43

Kila2 commented

same output image.
I'm testing stable-diffusion-webui, if the option --no-half is set it can generate correct image, otherwise it will output black image. I think the problem is related to model accuracy.

Kila2 commented

Running on an Intel iMac (2020) with a discrete Radeon 5700 (8GB), the result is always something in the like of the attached screenshot.

I've cloned the repository :

  • Diffusion-macOS: the problem is identical. The GPU is doing the work but the result is random pixels.
  • Diffusion-macOS: using ComputeUnits.cpuOnly (two modifications to ControlsView.swift), the CPU is (slowly) doing the work and the result is OK.
  • Diffusion: the CPU is doing the work and the result is OK.

In all cases, no error or exception is raised.

The console output is very similar :

Generating...
Got images: [Optional(<CGImage 0x7f813e3b19c0> (IP)
	<<CGColorSpace 0x60000192dda0> (kCGColorSpaceDeviceRGB)>
		width = 512, height = 512, bpc = 8, bpp = 24, row bytes = 1536 
		kCGImageAlphaNone | 0 (default byte order)  | kCGImagePixelFormatPacked 
		is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes)] in 17.003490924835205

Diffusion also outputs this, for each step :

2023-04-22 17:30:02.375841+0200 Diffusion[7894:267125] [API] cannot add handler to 3 from 3 - dropping

Screenshot 2023-04-22 at 17 02 43

I found a solution. You have to manually convert the model and set the precision to FP32.
Here is an example. https://github.com/apple/ml-stable-diffusion

        coreml_model = ct.convert(
            torchscript_module,
            convert_to="mlprogram",
            minimum_deployment_target=ct.target.macOS13,
            inputs=_get_coreml_inputs(sample_inputs, args),
            outputs=[ct.TensorType(name=name) for name in output_names],
            compute_units=ct.ComputeUnit[args.compute_unit],
            compute_precision=ct.precision.FLOAT32,
            # skip_model_load=True,
        )

Great investigation @Kila2! How does it affect speed?

Nice finding, thanks!

@pcuenca To answer your question (Debugging the app under Xcode, with default settings, the labrador prompt and timing the second generation only).

It's 25/1 performance ratio :

  • Built-in 2.1 model on CPU (8 cores i7 / 3.8Ghz) : 446s
  • The same model converted manually with the FLOAT32 on GPU (Radeon pro 5700 with 8GB) + CPU : 17s

Thanks again to @Kila2 and you. It's now working perfectly on the GPU.