argmaxinc/DiffusionKit

Python example code issue

Skisquaw opened this issue · 3 comments

I followed the instructions using conda. Install worked fine:
$ pip show diffusionkit
Name: diffusionkit
Version: 0.3.2
Summary: Argmax Model Optimization Toolkit for Diffusion Models.
Home-page: https://github.com/argmaxinc/DiffusionKit

These also worked fine:
cd diffusionkit/tests/torch2coreml
python test_mmdit.py --sd3-ckpt-path stabilityai/stable-diffusion-3-medium --model-version 2b -o ~/Dev/mlpackages
python test_vae.py --sd3-ckpt-path stabilityai/stable-diffusion-3-medium -o ~/Dev/mlpackages

I run this python code:
from diffusionkit.mlx import DiffusionPipeline
pipeline = DiffusionPipeline(
model="argmaxinc/stable-diffusion",
shift=3.0,
use_t5=False,
model_version="stable-diffusion-3-medium",
low_memory_mode=True,
a16=True,
w16=True,
)

HEIGHT = 512
WIDTH = 512
NUM_STEPS = 50 # 4 for FLUX.1-schnell, 50 for SD3
CFG_WEIGHT = 5. # for FLUX.1-schnell, 5. for SD3

image, _ = pipeline.generate_image(
"a photo of a cat",
cfg_weight=CFG_WEIGHT,
num_steps=NUM_STEPS,
latent_size=(HEIGHT // 8, WIDTH // 8),
)

image.save("cat_sd.png")

I get this error after a bit:
scikit-learn version 1.5.1 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.4.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.2.0 is the most recent version that has been tested.
INFO:diffusionkit.mlx:Pre text encoding peak memory: 0.0GB
INFO:diffusionkit.mlx:Pre text encoding active memory: 0.0GB
miniforge3/envs/diffusionkit032/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884
warnings.warn(
INFO:diffusionkit.mlx:Post text encoding peak memory: 2.597GB
INFO:diffusionkit.mlx:Post text encoding active memory: 1.786GB
INFO:diffusionkit.mlx:Text encoding time: 2.284s
INFO:diffusionkit.mlx:Pre denoise peak memory: 0.0GB
INFO:diffusionkit.mlx:Pre denoise active memory: 0.018GB
INFO:diffusionkit.mlx:Seed: 1724442915
0%| | 0/50 [00:00<?, ?it/s]INFO:diffusionkit.mlx.mmdit:Cached modulation_params for timesteps=array([1000, 993, 986, ..., 66.875, 8.92969, 0], dtype=float16)
INFO:diffusionkit.mlx.mmdit:Cached modulation_params will reduce peak memory by 1.3 GB
ERROR:diffusionkit.mlx.mmdit:Error in pre_sdpa: [layer_norm] weight must have 1 dimension but has 2 dimensions.

Note that the flux sample python code works fine:

INFO:diffusionkit.mlx:============= Summary =============
INFO:diffusionkit.mlx:Text encoder: 1.0s
INFO:diffusionkit.mlx:Denoising: 11.3s
INFO:diffusionkit.mlx:Image decoder: 0.4s
INFO:diffusionkit.mlx:Peak memory: 16.6GB
INFO:diffusionkit.mlx:============= Inference Context =============
INFO:diffusionkit.mlx:Operating System:
{'os_build_number': '23F79', 'os_type': 'macOS', 'os_version': '14.5'}
INFO:diffusionkit.mlx:Device:
{'cpu_core_count': 16,
'gpu_core_count': 40,
'max_ram': '136494940160',
'product_name': 'Apple M3 Max'}
INFO:diffusionkit.mlx:Total time: 13.335s

I followed the instructions using conda. Install worked fine: $ pip show diffusionkit Name: diffusionkit Version: 0.3.2 Summary: Argmax Model Optimization Toolkit for Diffusion Models. Home-page: https://github.com/argmaxinc/DiffusionKit

These also worked fine: cd diffusionkit/tests/torch2coreml python test_mmdit.py --sd3-ckpt-path stabilityai/stable-diffusion-3-medium --model-version 2b -o ~/Dev/mlpackages python test_vae.py --sd3-ckpt-path stabilityai/stable-diffusion-3-medium -o ~/Dev/mlpackages

I run this python code: from diffusionkit.mlx import DiffusionPipeline pipeline = DiffusionPipeline( model="argmaxinc/stable-diffusion", shift=3.0, use_t5=False, model_version="stable-diffusion-3-medium", low_memory_mode=True, a16=True, w16=True, )

HEIGHT = 512 WIDTH = 512 NUM_STEPS = 50 # 4 for FLUX.1-schnell, 50 for SD3 CFG_WEIGHT = 5. # for FLUX.1-schnell, 5. for SD3

image, _ = pipeline.generate_image( "a photo of a cat", cfg_weight=CFG_WEIGHT, num_steps=NUM_STEPS, latent_size=(HEIGHT // 8, WIDTH // 8), )

image.save("cat_sd.png")

I get this error after a bit: scikit-learn version 1.5.1 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API. Torch version 2.4.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.2.0 is the most recent version that has been tested. INFO:diffusionkit.mlx:Pre text encoding peak memory: 0.0GB INFO:diffusionkit.mlx:Pre text encoding active memory: 0.0GB miniforge3/envs/diffusionkit032/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884 warnings.warn( INFO:diffusionkit.mlx:Post text encoding peak memory: 2.597GB INFO:diffusionkit.mlx:Post text encoding active memory: 1.786GB INFO:diffusionkit.mlx:Text encoding time: 2.284s INFO:diffusionkit.mlx:Pre denoise peak memory: 0.0GB INFO:diffusionkit.mlx:Pre denoise active memory: 0.018GB INFO:diffusionkit.mlx:Seed: 1724442915 0%| | 0/50 [00:00<?, ?it/s]INFO:diffusionkit.mlx.mmdit:Cached modulation_params for timesteps=array([1000, 993, 986, ..., 66.875, 8.92969, 0], dtype=float16) INFO:diffusionkit.mlx.mmdit:Cached modulation_params will reduce peak memory by 1.3 GB ERROR:diffusionkit.mlx.mmdit:Error in pre_sdpa: [layer_norm] weight must have 1 dimension but has 2 dimensions.

Might be bugs in sdpa, can you try 2.3 pytorch?

#25 will fix this bug.