On an Intel Mac with discrete GPU, when using the GPU, the generation outputs some kind of random pattern

Question

On an Intel Mac with discrete GPU, when using the GPU, the generation outputs some kind of random pattern

bdev36 opened this issue 3 years ago · 5 comments

Running on an Intel iMac (2020) with a discrete Radeon 5700 (8GB), the result is always something in the like of the attached screenshot.

I've cloned the repository :

Diffusion-macOS: the problem is identical. The GPU is doing the work but the result is random pixels.
Diffusion-macOS: using ComputeUnits.cpuOnly (two modifications to ControlsView.swift), the CPU is (slowly) doing the work and the result is OK.
Diffusion: the CPU is doing the work and the result is OK.

In all cases, no error or exception is raised.

The console output is very similar :

Generating...
Got images: [Optional(<CGImage 0x7f813e3b19c0> (IP)
	<<CGColorSpace 0x60000192dda0> (kCGColorSpaceDeviceRGB)>
		width = 512, height = 512, bpc = 8, bpp = 24, row bytes = 1536 
		kCGImageAlphaNone | 0 (default byte order)  | kCGImagePixelFormatPacked 
		is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes)] in 17.003490924835205

Diffusion also outputs this, for each step :

2023-04-22 17:30:02.375841+0200 Diffusion[7894:267125] [API] cannot add handler to 3 from 3 - dropping

Answer 1 · 2023-05-25T15:42:54.000Z

same output image.
I'm testing stable-diffusion-webui, if the option --no-half is set it can generate correct image, otherwise it will output black image. I think the problem is related to model accuracy.

Answer 2 · 2023-05-30T14:23:38.000Z

Running on an Intel iMac (2020) with a discrete Radeon 5700 (8GB), the result is always something in the like of the attached screenshot.

I've cloned the repository :

Diffusion-macOS: the problem is identical. The GPU is doing the work but the result is random pixels.

Diffusion-macOS: using ComputeUnits.cpuOnly (two modifications to ControlsView.swift), the CPU is (slowly) doing the work and the result is OK.

Diffusion: the CPU is doing the work and the result is OK.

In all cases, no error or exception is raised.

The console output is very similar :
Generating...
Got images: [Optional(<CGImage 0x7f813e3b19c0> (IP)
	<<CGColorSpace 0x60000192dda0> (kCGColorSpaceDeviceRGB)>
		width = 512, height = 512, bpc = 8, bpp = 24, row bytes = 1536 
		kCGImageAlphaNone | 0 (default byte order)  | kCGImagePixelFormatPacked 
		is mask? No, has masking color? No, has soft mask? No, has matte? No, should interpolate? Yes)] in 17.003490924835205
Diffusion also outputs this, for each step :

2023-04-22 17:30:02.375841+0200 Diffusion[7894:267125] [API] cannot add handler to 3 from 3 - dropping

I found a solution. You have to manually convert the model and set the precision to FP32.
Here is an example. https://github.com/apple/ml-stable-diffusion

        coreml_model = ct.convert(
            torchscript_module,
            convert_to="mlprogram",
            minimum_deployment_target=ct.target.macOS13,
            inputs=_get_coreml_inputs(sample_inputs, args),
            outputs=[ct.TensorType(name=name) for name in output_names],
            compute_units=ct.ComputeUnit[args.compute_unit],
            compute_precision=ct.precision.FLOAT32,
            # skip_model_load=True,
        )

Answer 3 · 2023-05-30T15:26:17.000Z

Great investigation @Kila2! How does it affect speed?

Answer 4 · 2023-05-30T17:31:01.000Z

Nice finding, thanks!

Answer 5 · 2023-05-30T20:52:10.000Z

@pcuenca To answer your question (Debugging the app under Xcode, with default settings, the labrador prompt and timing the second generation only).

It's 25/1 performance ratio :

Built-in 2.1 model on CPU (8 cores i7 / 3.8Ghz) : 446s
The same model converted manually with the FLOAT32 on GPU (Radeon pro 5700 with 8GB) + CPU : 17s

Thanks again to @Kila2 and you. It's now working perfectly on the GPU.