[Bug] Stable Diffusion Compilation Errors

Question

[Bug] Stable Diffusion Compilation Errors

AlphaAtlas opened this issue a year ago · 32 comments

Describe the bug
Stable Diffusion pipeline compilation does not function properly. Even ignoring errors as described in the related issue, nvcc and python modules eventually start erroring out, and when the compilation is finally "done," the speed is the same as eager mode.

#202

To Reproduce
I posted the simple test script here: https://github.com/AlphaAtlas/Diffusion-Compilaton-Testing/blob/main/hidet_test.py

Along with a full log of the run on my machine: https://github.com/AlphaAtlas/Diffusion-Compilaton-Testing/blob/main/hidet.log

Enviroment

Hidet nightly (as of this post)
OS: Arch Linux
GPU: RTX 2060
Others: Nvidia Driver 530.41.03, Python 3.11, CUDA 12.1, Torch 2.1 Nightly

I understand diffusion and torch+cu121 is probably a work in progress. 👍 But I figured I would post my findings here anyway.

On a side note, this was tested with dynamic=False, but dynamic=True is almost a practical requirement for stable diffusion use outside of testing.

Answer 1 · 2023-05-12T20:22:43.000Z

Hi @AlphaAtlas,

Could you try the following script

from diffusers import StableDiffusionPipeline
import torch
import torch._dynamo
import hidet

hidet.option.cache_dir('./outs/cache')

#Create pipeline
#Change path to a remote model
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

#Typical diffusers pipeline optimizations
print("### Pre Optimization Benchmark:")
image = pipe("a photo of an astronaut riding a horse on mars").images[0]

# pipe.unet.to(memory_format=torch.channels_last) #Unsupported by hidet, but does not seem to make a difference if disabled.
pipe.enable_vae_tiling()
# pipe.enable_xformers_memory_efficient_attention()

print("### Post optimization benchmark:")
image = pipe("a photo of an astronaut riding a horse on mars").images[0]



#Set compile
torch._dynamo.config.suppress_errors = True
# more search 
hidet.torch.dynamo_config.search_space(0)
# automatically transform the model to use float16 data type
hidet.torch.dynamo_config.use_fp16(True)
# use float16 data type as the accumulate data type in operators with reduction
hidet.torch.dynamo_config.use_fp16_reduction(True)
# use tensorcore
hidet.torch.dynamo_config.use_tensor_core()
pipe.unet = torch.compile(pipe.unet, backend="hidet")

print("### torch.compile warmup:")
image = pipe("a photo of an astronaut riding a horse on mars").images[0]

print("torch.compile benchmark:")
image = pipe("a photo of an astronaut riding a horse on mars").images[0]

Could you please try the above script and give us the log of above script and the operator cache in 'outs/cache' directory?

I will let @xinli-git to follow up with you.

Answer 2 · 2023-05-12T20:55:47.000Z

outs.zip
hidet_debug_log.txt

Thanks for taking a look 👍

Also, not sure if this is relevant, but the git version of Diffusers this is being tested on works with Inductor, with no graph breaks I think.

Answer 3 · 2023-05-12T21:05:53.000Z

And some more system info gathered with the pytorch bug reporting script:

PyTorch version: 2.1.0.dev20230510+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: CachyOS (x86_64)
GCC version: (GCC) 13.1.1 20230504
Clang version: 15.0.7
CMake version: version 3.26.3
Libc version: glibc-2.37

Python version: 3.11.3 (main, May  4 2023, 16:07:26) [GCC 13.1.1 20230429] (64-bit runtime)
Python platform: Linux-6.3.1-zen2-1-zen-x86_64-with-glibc2.37
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2060 with Max-Q Design
Nvidia driver version: 530.41.03
cuDNN version: Probably one of the following:
/usr/lib/libcudnn.so.8.8.0
/usr/lib/libcudnn_adv_infer.so.8.8.0
/usr/lib/libcudnn_adv_train.so.8.8.0
/usr/lib/libcudnn_cnn_infer.so.8.8.0
/usr/lib/libcudnn_cnn_train.so.8.8.0
/usr/lib/libcudnn_ops_infer.so.8.8.0
/usr/lib/libcudnn_ops_train.so.8.8.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   44 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          16
On-line CPU(s) list:             0-15
Vendor ID:                       AuthenticAMD
Model name:                      AMD Ryzen 9 4900HS with Radeon Graphics
CPU family:                      23
Model:                           96
Thread(s) per core:              2
Core(s) per socket:              8
Socket(s):                       1
Stepping:                        1
Frequency boost:                 enabled
CPU(s) scaling MHz:              68%
CPU max MHz:                     3000.0000
CPU min MHz:                     1400.0000
BogoMIPS:                        5988.64
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
Virtualization:                  AMD-V
L1d cache:                       256 KiB (8 instances)
L1i cache:                       256 KiB (8 instances)
L2 cache:                        4 MiB (8 instances)
L3 cache:                        8 MiB (2 instances)
NUMA node(s):                    1
NUMA node0 CPU(s):               0-15
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Vulnerable
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:        Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] clip-anytorch==2.5.2
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] pytorch-lightning==2.0.2
[pip3] pytorch-triton==2.1.0+7d1a95b046
[pip3] torch==2.1.0.dev20230510+cu121
[pip3] torchdiffeq==0.2.3
[pip3] torchmetrics==1.0.0rc0
[pip3] torchsde==0.2.5
[pip3] torchvision==0.16.0.dev20230510+cu121
[conda] Could not collect

Answer 4 · 2023-05-12T23:26:08.000Z

Hi @AlphaAtlas, thanks for your interest in using Hidet, and really appreciate the detailed bug description!

There are two observations I had

You are using a very new version of diffusers library that had introduced some additional unexpected behavior such as this "Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet., occurred when interpreting L__self___resnets_0_conv1 with
HidetConv2d(tensor(...))". This was not observed in the older version of diffusers==0.11. I will work on a fix.
the nvcc compilation bug seems to be gone with search_space(0). this makes me wonder if the initial error "Failed to compile the lowered source code via nvcc." in the log is consistently reproducible with search_space(2) or not. In principle, the bug should appear in both cases. I was also not able to reproduce this on my end with 2080TI. Would it be possible to confirm that the initial nvcc error is consistently reproducible under the initial setup (search_space(2))

Therefore, related to 1, if you downgrade diffusers to 0.11, you should see no errors at all. In the meanwhile, I will work on a fix to address 1 and wait for your confirmation on the consistency of 2.

Answer 5 · 2023-05-13T00:00:18.000Z

Thanks for the support!

2: Yeah nvcc compilation failures seem to be happening consistently, with search space 2 (or 1). The nvcc errors always starts about halfway through each compilation sequence:

Failed to compile the lowered source code via nvcc.████████▍ | 696/1092 [07:02<04:35, 1.44it/s]

Are there any extra debug options I can add to figure out why nvcc is failing? In the meantime, the search space 2 test is running, and I can upload the outs directory when it is done.

1: Rolling back diffusers that far is problematic, but would be an interesting test. I will try that when the hidet run is done.

Answer 6 · 2023-05-13T00:23:05.000Z

Oh, the log folder is huge, had to split it.
hidet_debug_log_space_2.txt
cuda_space_2_fused.zip
outs.zip

Answer 7 · 2023-05-13T00:43:22.000Z

Here is a log from 0.11
hidet_diffusers_0.11.txt

Answer 8 · 2023-05-13T01:05:26.000Z

Google Colab appears to have the same behavior. Here is a log and the ipynb used:
colab.txt
Hidet_diffusion_test.ipynb.txt

Answer 9 · 2023-05-13T02:39:10.000Z

sounds good! I'll look into this as well :)

Answer 10 · 2023-05-15T15:59:54.000Z

Hi @AlphaAtlas , some updates :
so there are three "errors" in the log

Failed to compile the source code with nvcc (here). This should be fixed once #227 is merged
Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet. For this one, you must not enable channels_last before sending the model to torch.compile(..., backend='hidet)
We also do not support torch.nn.functional.scaled_dot_product_attention yet, but this should be added within the next week or two. As of right now, you can safely ignore this error using suppress errors and let hidet compile the rest of the graph.

Answer 11 · 2023-05-15T16:24:20.000Z

Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet. For this one, you must not enable channels_last before sending the model to torch.compile(..., backend='hidet)

The Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet. error is occurring even with the channels last code commented out, as it is in the above test script and the colab test.

Answer 12 · 2023-05-15T20:46:02.000Z

would you able to test it again with the latest version of hidet after pulling the change?
Below is my log with search space 1 and channels last commented out

logs.txt

from diffusers import StableDiffusionPipeline
import torch
import torch._dynamo
import hidet

#Create pipeline
#Change path to a remote model
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

#Typical diffusers pipeline optimizations
#print("### Pre Optimization Benchmark:")
#image = pipe("a photo of an astronaut riding a horse on mars").images[0]
#
# pipe.unet.to(memory_format=torch.channels_last) #Unsupported by hidet, but does not seem to make a difference if disabled.
pipe.enable_vae_tiling()
#pipe.enable_xformers_memory_efficient_attention()
#
#print("### Post optimization benchmark:")
image = pipe("a photo of an astronaut riding a horse on mars").images[0]


hidet.option.parallel_build(False)

#Set compile
torch._dynamo.config.suppress_errors = True
# more search
hidet.torch.dynamo_config.search_space(1)
hidet.torch.dynamo_config.dump_graph_ir("./local_graph")
hidet.option.cache_dir("local_cache")
# automatically transform the model to use float16 data type
hidet.torch.dynamo_config.use_fp16(True)
# use float16 data type as the accumulate data type in operators with reduction
hidet.torch.dynamo_config.use_fp16_reduction(True)
# use tensorcore
hidet.torch.dynamo_config.use_tensor_core()
pipe.unet = torch.compile(pipe.unet, backend="hidet")

print("### torch.compile warmup:")
image = pipe("a photo of an astronaut riding a horse on mars").images[0]

print("torch.compile benchmark:")
image = pipe("a photo of an astronaut riding a horse on mars").images[0]

Answer 13 · 2023-05-15T21:43:26.000Z

Hmmm, I am having installing the git version of hidet. It imports just fine, but when the script gets to torch.compile(...backend="hidet"), torch says it can't find the hidet backend. I have all the environment variables from here: https://docs.hidet.org/stable/getting-started/build-from-source.html

Does it build at 7 every day? I can just wait till tomorrow's nightly build.

Answer 14 · 2023-05-15T22:06:27.000Z

ah, I have been mostly doing it this way

git clone ...
cd hidet
mkdir build && cd build
cmake ..
make
cd ..
pip install -e .

Answer 15 · 2023-05-15T22:06:33.000Z

Hi @AlphaAtlas, after you build the shared library, you can try

$ pip install -e .

under the root of the repo.

Answer 16 · 2023-05-15T22:41:18.000Z

The benchmark finishes now! But some warnings are still there, and its a bit slower than uncompiled diffusers (possibly because the 2060 is at steady state instead of turbo clocks):

~/AI/difftest
venv ❯ python hidet_debug.py
/usr/lib/python3.11/site-packages/h5py/__init__.py:36: UserWarning: h5py is running against HDF5 1.14.1 when it was built against 1.14.0, this may cause problems
  _warn(("h5py is running against HDF5 {0} when it was built against {1}, "
100%|████████████████████████████████████████████████████████████████████| 50/50 [00:08<00:00,  5.62it/s]
/home/alpha/.local/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:482: UserWarning: torch.compile support of Python 3.11 is experimental. Program may segfault.
  warnings.warn(
### torch.compile warmup:
  0%|                                                                             | 0/50 [00:00<?, ?it/s][2023-05-15 18:17:16,627] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_condition.py line 610
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/dynamo_backends.py:67: UserWarning: Hidet received a non-contiguous torch input tensor, converting it to contiguous
  warnings.warn_once('Hidet received a non-contiguous torch input tensor, converting it to contiguous')
Compiling cuda task broadcast(data=float16(1280, 1280), out=float16(1, 1280, 1280))...
Compiling cuda task rearrange(x=float16(1, 4, 320, 1280), y=float16(4, 320, 1280))...
Compiling cpu task cast(x=float64(2, 320), y=float16(2, 320))...
Compiling cuda task fused(b=float16(4, 80, 1280), data=float16(2, 320), y=float16(1, 4, 2, 1280), fused_ops='broadcast reshape rearrange batch_matmul reshape', anchor='batch_matmul')...
Compiling cpu task cast(x=float64(4, 80, 1280), y=float16(4, 80, 1280))...
Compiling cpu task cast(x=float64(1, 4, 2, 1280), y=float16(1, 4, 2, 1280))...
Compiling: 100%|██████████████████████████████| 113/113 [01:40<00:00,  1.12it/s]
Batch build 113 modules within 101.160 seconds, on average 0.9 seconds per module.
Benchmarking: 100%|██████████████████████████| 113/113 [00:00<00:00, 815.47it/s]
Compiling cuda task reduce_sum_f16(x=float16(1, 4, 2, 1280), y=float16(1, 2, 1280), dims=[1], keep_dim=False, reduce_type=sum, accumulate_dtype='float32')...
Compiling cuda task fused(b=float16(4, 320, 1280), y=float16(1280,), x=float16(1, 2, 1280), y=float16(1, 4, 2, 1280), fused_ops='reshape add silu broadcast reshape rearrange batch_matmul reshape', anchor='batch_matmul')...
Compiling cpu task cast(x=float64(4, 320, 1280), y=float16(4, 320, 1280))...
Compiling cpu task cast(x=float64(1280,), y=float16(1280,))...
Compiling cpu task cast(x=float64(1, 2, 1280), y=float16(1, 2, 1280))...
Compiling: 100%|██████████████████████████████| 113/113 [01:58<00:00,  1.05s/it]
Batch build 113 modules within 118.357 seconds, on average 1.0 seconds per module.
Benchmarking: 100%|██████████████████████████| 113/113 [00:00<00:00, 344.49it/s]
Compiling cuda task fused(y=float16(1280,), x=float16(1, 2, 1280), z=float16(2, 1280), fused_ops='reshape add', anchor='add')...
[2023-05-15 18:21:28,427] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 851
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


[2023-05-15 18:21:28,474] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/resnet.py line 551
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 281, in _raise_exception
    raise type(exception)(
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
ValueError: from_dlpack: got tensor with shape [320, 320, 3, 3] and strides [2880, 1, 960, 320]. Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet., occurred when interpreting L__self___conv1 with
  HidetConv2d(tensor(...))
HidetConv2d is defined at
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/register_modules.py", line 52


[2023-05-15 18:21:28,545] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/transformer_2d.py line 214
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


[2023-05-15 18:21:28,619] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/attention.py line 121
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention_processor.Attention'>


[2023-05-15 18:21:28,649] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/attention_processor.py line 295
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  torch.nn.functional.scaled_dot_product_attention


[2023-05-15 18:21:28,677] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT __call__ /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/attention_processor.py line 872
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  torch.nn.functional.scaled_dot_product_attention


Compiling cuda task rearrange(x=float16(2560, 320), y=float16(320, 2560))...
Compiling cuda task rearrange(x=float16(320, 1280), y=float16(1280, 320))...
Compiling cuda task broadcast(data=float16(320, 2560), out=float16(2, 320, 2560))...
Compiling cuda task broadcast(data=float16(1280, 320), out=float16(2, 1280, 320))...
Compiling cpu task cast(x=float64(2, 4096, 320), y=float16(2, 4096, 320))...
Compiling cuda task fused(a=float16(2, 4096, 320), b=float16(2, 320, 2560), y=float16(2560,), z=float16(2, 4096, 2560), fused_ops='batch_matmul add', anchor='batch_matmul')...
Compiling cpu task cast(x=float64(2, 320, 2560), y=float16(2, 320, 2560))...
Compiling cpu task cast(x=float64(2560,), y=float16(2560,))...
Compiling cpu task cast(x=float64(2, 4096, 2560), y=float16(2, 4096, 2560))...
Compiling: 100%|██████████████████████████████| 113/113 [01:35<00:00,  1.18it/s]
Batch build 113 modules within 95.877 seconds, on average 0.8 seconds per module.
Benchmarking: 100%|███████████████████████████| 113/113 [00:29<00:00,  3.85it/s]
Compiling cuda task fused(b=float16(2, 1280, 320), y=float16(320,), data=float16(2, 4096, 2560), z=float16(2, 4096, 320), fused_ops='slice slice gelu mul batch_matmul add', anchor='batch_matmul')...
Compiling cpu task cast(x=float64(2, 1280, 320), y=float16(2, 1280, 320))...
Compiling cpu task cast(x=float64(320,), y=float16(320,))...
Compiling: 100%|██████████████████████████████| 113/113 [01:42<00:00,  1.10it/s]
Batch build 113 modules within 102.725 seconds, on average 0.9 seconds per module.
Benchmarking: 100%|███████████████████████████| 113/113 [00:08<00:00, 13.08it/s]
[2023-05-15 18:26:02,158] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/resnet.py line 199
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 281, in _raise_exception
    raise type(exception)(
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
ValueError: from_dlpack: got tensor with shape [320, 320, 3, 3] and strides [2880, 1, 960, 320]. Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet., occurred when interpreting L__self___conv with
  HidetConv2d(tensor(...))
HidetConv2d is defined at
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/register_modules.py", line 52


Compiling cuda task rearrange(x=float16(5120, 640), y=float16(640, 5120))...
Compiling cuda task rearrange(x=float16(640, 2560), y=float16(2560, 640))...
Compiling cuda task broadcast(data=float16(640, 5120), out=float16(2, 640, 5120))...
Compiling cuda task broadcast(data=float16(2560, 640), out=float16(2, 2560, 640))...
Compiling cpu task cast(x=float64(2, 1024, 640), y=float16(2, 1024, 640))...
Compiling cuda task fused(a=float16(2, 1024, 640), b=float16(2, 640, 5120), y=float16(5120,), z=float16(2, 1024, 5120), fused_ops='batch_matmul add', anchor='batch_matmul')...
Compiling cpu task cast(x=float64(2, 640, 5120), y=float16(2, 640, 5120))...
Compiling cpu task cast(x=float64(5120,), y=float16(5120,))...
Compiling cpu task cast(x=float64(2, 1024, 5120), y=float16(2, 1024, 5120))...
Compiling: 100%|██████████████████████████████| 113/113 [01:28<00:00,  1.28it/s]
Batch build 113 modules within 88.635 seconds, on average 0.8 seconds per module.
Benchmarking: 100%|███████████████████████████| 113/113 [00:17<00:00,  6.34it/s]
Compiling cuda task fused(b=float16(2, 2560, 640), y=float16(640,), data=float16(2, 1024, 5120), z=float16(2, 1024, 640), fused_ops='slice slice gelu mul batch_matmul add', anchor='batch_matmul')...
Compiling cpu task cast(x=float64(2, 2560, 640), y=float16(2, 2560, 640))...
Compiling cpu task cast(x=float64(640,), y=float16(640,))...
Compiling: 100%|██████████████████████████████| 113/113 [01:46<00:00,  1.06it/s]
Batch build 113 modules within 106.461 seconds, on average 0.9 seconds per module.
Benchmarking: 100%|███████████████████████████| 113/113 [00:06<00:00, 17.00it/s]
Compiling cuda task rearrange(x=float16(10240, 1280), y=float16(1280, 10240))...
Compiling cuda task rearrange(x=float16(1280, 5120), y=float16(5120, 1280))...
Compiling cuda task broadcast(data=float16(1280, 10240), out=float16(2, 1280, 10240))...
Compiling cuda task broadcast(data=float16(5120, 1280), out=float16(2, 5120, 1280))...
Compiling cpu task cast(x=float64(2, 256, 1280), y=float16(2, 256, 1280))...
Compiling cuda task fused(a=float16(2, 256, 1280), b=float16(2, 1280, 10240), y=float16(10240,), z=float16(2, 256, 10240), fused_ops='batch_matmul add', anchor='batch_matmul')...
Compiling cpu task cast(x=float64(2, 1280, 10240), y=float16(2, 1280, 10240))...
Compiling cpu task cast(x=float64(10240,), y=float16(10240,))...
Compiling cpu task cast(x=float64(2, 256, 10240), y=float16(2, 256, 10240))...
Compiling: 100%|██████████████████████████████| 113/113 [01:32<00:00,  1.23it/s]
Batch build 113 modules within 92.221 seconds, on average 0.8 seconds per module.
Benchmarking: 100%|███████████████████████████| 113/113 [00:12<00:00,  9.14it/s]
Compiling cuda task fused(b=float16(2, 5120, 1280), y=float16(1280,), data=float16(2, 256, 10240), z=float16(2, 256, 1280), fused_ops='slice slice gelu mul batch_matmul add', anchor='batch_matmul')...
Compiling cpu task cast(x=float64(2, 5120, 1280), y=float16(2, 5120, 1280))...
Compiling: 100%|██████████████████████████████| 113/113 [01:52<00:00,  1.01it/s]
Batch build 113 modules within 112.640 seconds, on average 1.0 seconds per module.
Benchmarking: 100%|███████████████████████████| 113/113 [00:06<00:00, 17.10it/s]
[2023-05-15 18:34:36,196] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 948
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 281, in _raise_exception
    raise type(exception)(
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
ValueError: from_dlpack: got tensor with shape [1280, 1280, 3, 3] and strides [11520, 1, 3840, 1280]. Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet., occurred when interpreting L__self___resnets_0_conv1 with
  HidetConv2d(tensor(...))
HidetConv2d is defined at
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/register_modules.py", line 52


[2023-05-15 18:34:36,368] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 559
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


[2023-05-15 18:34:36,372] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT __init__ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/container.py line 276
due to:
Traceback (most recent call last):
  File "/home/alpha/.local/lib/python3.11/site-packages/torch/_dynamo/exc.py", line 134, in unimplemented
    raise Unsupported(msg)
torch._dynamo.exc.Unsupported: Guard setup for uninitialized class <class 'torch.nn.modules.container.ModuleList'>

Set torch._dynamo.config.verbose=True or TORCHDYNAMO_VERBOSE=1 for more information


Compiling cuda task rearrange(x=float16(2, 2, 2560, 1280), y=float16(4, 2560, 1280))...
Compiling cpu task cast(x=float64(2, 64, 1280), y=float16(2, 64, 1280))...
Compiling cuda task add(x=float16(2, 64, 1280), y=float16(1280,), z=float16(2, 64, 1280))...
Compiling cuda task fused(a=float16(2, 64, 1280), b=float16(2, 1280, 10240), y=float16(10240,), z=float16(2, 64, 10240), fused_ops='batch_matmul add', anchor='batch_matmul')...
Compiling cpu task cast(x=float64(2, 64, 10240), y=float16(2, 64, 10240))...
Compiling: 100%|██████████████████████████████| 113/113 [01:41<00:00,  1.11it/s]
Batch build 113 modules within 101.780 seconds, on average 0.9 seconds per module.
Benchmarking: 100%|███████████████████████████| 113/113 [00:04<00:00, 25.91it/s]
Compiling cuda task fused(b=float16(4, 2560, 1280), data=float16(2, 64, 10240), y=float16(2, 2, 64, 1280), fused_ops='slice slice gelu mul reshape rearrange batch_matmul reshape', anchor='batch_matmul')...
Compiling cpu task cast(x=float64(4, 2560, 1280), y=float16(4, 2560, 1280))...
Compiling cpu task cast(x=float64(2, 2, 64, 1280), y=float16(2, 2, 64, 1280))...
Compiling: 100%|██████████████████████████████| 113/113 [02:16<00:00,  1.21s/it]
Batch build 113 modules within 136.928 seconds, on average 1.2 seconds per module.
Benchmarking: 100%|███████████████████████████| 113/113 [00:02<00:00, 40.03it/s]
Compiling cuda task reduce_sum_f16(x=float16(2, 2, 64, 1280), y=float16(2, 64, 1280), dims=[1], keep_dim=False, reduce_type=sum, accumulate_dtype='float32')...
[2023-05-15 18:39:14,940] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 1948
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 281, in _raise_exception
    raise type(exception)(
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
ValueError: from_dlpack: got tensor with shape [1280, 2560, 3, 3] and strides [23040, 1, 7680, 2560]. Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet., occurred when interpreting L__self___resnets_0_conv1 with
  HidetConv2d(tensor(...))
HidetConv2d is defined at
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/register_modules.py", line 52


[2023-05-15 18:39:14,962] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/resnet.py line 126
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 281, in _raise_exception
    raise type(exception)(
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
ValueError: from_dlpack: got tensor with shape [1280, 1280, 3, 3] and strides [11520, 1, 3840, 1280]. Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet., occurred when interpreting L__self___conv with
  HidetConv2d(tensor(...))
HidetConv2d is defined at
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/register_modules.py", line 52


[2023-05-15 18:39:15,358] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 1849
due to:
Traceback (most recent call last):
  File "/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


/home/alpha/clone/hidet/python/hidet/graph/frontend/torch/dynamo_backends.py:67: UserWarning: Hidet received a non-contiguous torch input tensor, converting it to contiguous
  warnings.warn_once('Hidet received a non-contiguous torch input tensor, converting it to contiguous')
100%|████████████████████████████████████████████████████████████████████| 50/50 [22:12<00:00, 26.65s/it]
torch.compile benchmark:
100%|████████████████████████████████████████████████████████████████████| 50/50 [00:09<00:00,  5.13it/s]

Answer 17 · 2023-05-15T22:44:17.000Z

local_graph.zip

Answer 18 · 2023-05-15T23:43:09.000Z

May I know what version of diffusers are you running?

Answer 19 · 2023-05-16T00:28:26.000Z

I pulled the latest git, as I downgraded to 0.11 for an earlier test.

diffusers @ git+https://github.com/huggingface/diffusers.git@29b1325a5ae28fa8d7f459b372582287ffc571e5

And I have been running this for stable diffusion anyway because it has some Inductor torch.compile fixes.

Answer 20 · 2023-05-16T01:18:06.000Z

Thanks for the link! I tried to run with the version there.

However, I was still not able to reproduce the "contiguous" error with channels_last commented out, although I was able to reproduce the error with channels_last on.

In terms of performance, I recommend that once everything is running functionally, change the search space with hidet.option.search_space(2) to expand the auto-tuning space. This might take much longer (around 3 to 5x longer) to tune but should get you more performance gain.

Answer 21 · 2023-05-16T01:56:35.000Z

@xinli-git Hmm, what version of pytorch are yall running?

Answer 22 · 2023-05-16T02:54:17.000Z

# python -c 'import torch; print(torch.__version__)'
2.1.0.dev20230515+cu121

This is my log with search space 1 (operators are cached so there is no tuning logs)

# python  test.py 
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.0+cu118 with CUDA 1108 (you have 2.1.0.dev20230515+cu121)
    Python  3.10.11 (you have 3.10.10)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
/home/lixin39/projects/hidet_dev/hidet/python/hidet/utils/stack_limit.py:24: UserWarning: The hard limit for stack size is too small (64.0 MiB), we recommend to increase it to 512.0 MiB. If you are the root user on Linux OS, you could refer to `man limits.conf` to increase this limit.
  warnings.warn(
Cannot initialize model with low cpu memory usage because `accelerate` was not found in the environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install `accelerate` for faster and less memory-intense model loading. You can do so with: 
pip install accelerate
.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:04<00:00, 11.34it/s]
### torch.compile warmup:
  0%|                                                                                                                                                                                                                                                                                                                                                        | 0/50 [00:00<?, ?it/s][2023-05-16 02:38:39,248] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /root/miniconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py line 610 
due to: 
Traceback (most recent call last):
  File "/home/lixin39/projects/hidet_dev/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


/home/lixin39/projects/hidet_dev/hidet/python/hidet/graph/frontend/torch/dynamo_backends.py:67: UserWarning: Hidet received a non-contiguous torch input tensor, converting it to contiguous
  warnings.warn_once('Hidet received a non-contiguous torch input tensor, converting it to contiguous')
[2023-05-16 02:38:40,113] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /root/miniconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py line 851 
due to: 
Traceback (most recent call last):
  File "/home/lixin39/projects/hidet_dev/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


[2023-05-16 02:38:42,082] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /root/miniconda3/lib/python3.10/site-packages/diffusers/models/transformer_2d.py line 214 
due to: 
Traceback (most recent call last):
  File "/home/lixin39/projects/hidet_dev/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


[2023-05-16 02:38:42,183] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /root/miniconda3/lib/python3.10/site-packages/diffusers/models/attention.py line 121 
due to: 
Traceback (most recent call last):
  File "/home/lixin39/projects/hidet_dev/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention_processor.Attention'>


[2023-05-16 02:38:42,228] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /root/miniconda3/lib/python3.10/site-packages/diffusers/models/attention_processor.py line 295 
due to: 
Traceback (most recent call last):
  File "/home/lixin39/projects/hidet_dev/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  torch.nn.functional.scaled_dot_product_attention


[2023-05-16 02:38:42,271] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT __call__ /root/miniconda3/lib/python3.10/site-packages/diffusers/models/attention_processor.py line 872 
due to: 
Traceback (most recent call last):
  File "/home/lixin39/projects/hidet_dev/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  torch.nn.functional.scaled_dot_product_attention


[2023-05-16 02:38:56,855] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /root/miniconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py line 559 
due to: 
Traceback (most recent call last):
  File "/home/lixin39/projects/hidet_dev/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


[2023-05-16 02:38:58,657] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT __init__ /root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/container.py line 276 
due to: 
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.10/site-packages/torch/_dynamo/exc.py", line 134, in unimplemented
    raise Unsupported(msg)
torch._dynamo.exc.Unsupported: Guard setup for uninitialized class <class 'torch.nn.modules.container.ModuleList'>

Set torch._dynamo.config.verbose=True or TORCHDYNAMO_VERBOSE=1 for more information


[2023-05-16 02:39:05,670] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /root/miniconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py line 1849 
due to: 
Traceback (most recent call last):
  File "/home/lixin39/projects/hidet_dev/hidet/python/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


/home/lixin39/projects/hidet_dev/hidet/python/hidet/graph/frontend/torch/dynamo_backends.py:67: UserWarning: Hidet received a non-contiguous torch input tensor, converting it to contiguous
  warnings.warn_once('Hidet received a non-contiguous torch input tensor, converting it to contiguous')
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:56<00:00,  1.14s/it]
torch.compile benchmark:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:05<00:00,  9.25it/s]

I did see an xformers warning, perhaps also disable xformers?

Answer 23 · 2023-05-16T03:06:40.000Z

Xformers is already disabled in the script, but huggingface diffusers still tries to import it as a test I think. It will always throw that error in torch nightly unless you install it with pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

Out in the wild, even with torch SPDA, xformers is pretty much standard for Stable Diffusion... But one thing at a time :P

Hmm... I am going to create a cleaner python environment and test this again.

Answer 24 · 2023-05-16T13:10:36.000Z

Out in the wild, even with torch SPDA, xformers is pretty much standard for Stable Diffusion... But one thing at a time :P

Yes, I think I have observed similar thing. I guess so long as the other parts of the graph are being compiled by hidet, we are also happy to delegate SPDA to xformers as their implementation is done very well.

Thanks, please let me know if the problem still persists :)

Answer 25 · 2023-05-24T20:49:26.000Z

Another test, in a fresh venv, this time with xformers enabled and search space set to 0 (since 1 was taking forever to compile):

~/AI/difftest
venv ❯ python diffusers_example.py
/usr/lib/python3.11/site-packages/h5py/__init__.py:36: UserWarning: h5py is running against HDF5 1.14.1 when it was built against 1.14.0, this may cause problems
  _warn(("h5py is running against HDF5 {0} when it was built against {1}, "
Run torch compile
  0%|                                                                             | 0/50 [00:00<?, ?it/s][2023-05-24 16:43:20,530] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_condition.py line 610
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


Compiling cuda task full(c=int64(2,), shape=[2], value=0, dtype='int64')...
Compiling cuda task fused(y=float16(1, 160), x=int64(2,), y=float32(2, 320), fused_ops='rearrange slice cast cast mul muls sin cos concat slice slice concat cast', anchor='cast')...
/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/dynamo_backends.py:67: UserWarning: Hidet received a non-contiguous torch input tensor, converting it to contiguous
  warnings.warn_once('Hidet received a non-contiguous torch input tensor, converting it to contiguous')
Compiling cpu task cast(x=float32(2, 320), y=float16(2, 320))...
Compiling cuda task fused(y=float16(1280,), x=float16(1, 2, 1280), z=float16(2, 1280), fused_ops='reshape add', anchor='add')...
Compiling cuda task reduce_sum_f16(x=float16(1, 4, 2, 1280), y=float16(1, 2, 1280), dims=[1], keep_dim=False, reduce_type=sum, accumulate_dtype='float32')...
Compiling cuda task fused(b=float16(4, 80, 1280), data=float16(2, 320), y=float16(1, 4, 2, 1280), fused_ops='broadcast reshape rearrange batch_matmul reshape', anchor='batch_matmul')...
Compiling cuda task fused(b=float16(4, 320, 1280), y=float16(1280,), x=float16(1, 2, 1280), y=float16(1, 4, 2, 1280), fused_ops='reshape add silu broadcast reshape rearrange batch_matmul reshape', anchor='batch_matmul')...
[2023-05-24 16:43:33,625] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 878
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


[2023-05-24 16:43:33,673] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/resnet.py line 591
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 281, in _raise_exception
    raise type(exception)(
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
ValueError: from_dlpack: got tensor with shape [320, 320, 3, 3] and strides [2880, 1, 960, 320]. Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet., occurred when interpreting L__self___conv1 with
  HidetConv2d(tensor(...))
HidetConv2d is defined at
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/register_modules.py", line 52


[2023-05-24 16:43:33,787] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/transformer_2d.py line 214
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


[2023-05-24 16:43:33,875] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/attention.py line 121
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention_processor.Attention'>


Compiling cpu task cast(x=float32(2, 4096, 320), y=float16(2, 4096, 320))...
Compiling cuda task batch_matmul(a=float16(2, 4096, 320), b=float16(2, 320, 960), c=float16(2, 4096, 960), batch_size=2, m_size=4096, n_size=960, k_size=320, mma='mma')...
Compiling cuda task fused(data=float16(2, 4096, 960), y=float16(16, 4096, 40), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task fused(data=float16(2, 4096, 960), y=float16(16, 4096, 40), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task fused(data=float16(2, 4096, 960), y=float16(16, 4096, 40), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
[2023-05-24 16:43:42,187] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT <resume in not_supported_reasons> /home/alpha/.local/lib/python3.11/site-packages/xformers/ops/fmha/flash.py line 209
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <function get_device_capability at 0x7f52bb97e840>


[2023-05-24 16:43:42,215] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT _convert_input_format /home/alpha/.local/lib/python3.11/site-packages/xformers/ops/fmha/flash.py line 127
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 281, in _raise_exception
    raise type(exception)(
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
AssertionError: , occurred when interpreting reshape with
  tensor_reshape(tensor(...), [65536, 1, 40])
tensor_reshape is defined at
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/register_methods.py", line 132


[2023-05-24 16:43:42,221] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT _flash_fwd /home/alpha/.local/lib/python3.11/site-packages/xformers/ops/fmha/flash.py line 45
due to:
Traceback (most recent call last):
  File "/home/alpha/.local/lib/python3.11/site-packages/torch/_subclasses/meta_utils.py", line 186, in meta_tensor
    assert not torch._C._dispatch_tls_local_exclude_set().has(
AssertionError:


Compiling cpu task cast(x=float32(16, 4096, 40), y=float16(16, 4096, 40))...
Compiling cuda task fused(b=float16(2, 320, 320), y=float16(320,), x=float16(16, 4096, 40), y=float16(2, 4096, 320), fused_ops='reshape rearrange reshape batch_matmul add divs', anchor='batch_matmul')...
Compiling cpu task cast(x=float32(2, 77, 768), y=float16(2, 77, 768))...
Compiling cuda task fused(a=float16(2, 4096, 320), b=float16(2, 320, 320), y=float16(16, 4096, 40), fused_ops='batch_matmul reshape rearrange reshape', anchor='batch_matmul')...
Compiling cuda task reduce_sum_f16(x=float16(2, 2, 77, 640), y=float16(2, 77, 640), dims=[1], keep_dim=False, reduce_type=sum, accumulate_dtype='float32')...
Compiling cuda task fused(data=float16(2, 77, 640), y=float16(16, 77, 40), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task fused(b=float16(4, 384, 640), x=float16(2, 77, 768), y=float16(2, 2, 77, 640), fused_ops='reshape rearrange batch_matmul reshape', anchor='batch_matmul')...
Compiling cuda task fused(data=float16(2, 77, 640), y=float16(16, 77, 40), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task fused(a=float16(2, 4096, 320), b=float16(2, 320, 2560), y=float16(2560,), z=float16(2, 4096, 2560), fused_ops='batch_matmul add', anchor='batch_matmul')...
Compiling cuda task fused(b=float16(2, 1280, 320), y=float16(320,), data=float16(2, 4096, 2560), z=float16(2, 4096, 320), fused_ops='slice slice gelu mul batch_matmul add', anchor='batch_matmul')...
[2023-05-24 16:44:12,873] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/resnet.py line 211
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 281, in _raise_exception
    raise type(exception)(
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
ValueError: from_dlpack: got tensor with shape [320, 320, 3, 3] and strides [2880, 1, 960, 320]. Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet., occurred when interpreting L__self___conv with
  HidetConv2d(tensor(...))
HidetConv2d is defined at
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/register_modules.py", line 52


Compiling cpu task cast(x=float32(2, 1024, 640), y=float16(2, 1024, 640))...
Compiling cuda task batch_matmul(a=float16(2, 1024, 640), b=float16(2, 640, 1920), c=float16(2, 1024, 1920), batch_size=2, m_size=1024, n_size=1920, k_size=640, mma='mma')...
Compiling cuda task fused(data=float16(2, 1024, 1920), y=float16(16, 1024, 80), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task fused(data=float16(2, 1024, 1920), y=float16(16, 1024, 80), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task fused(data=float16(2, 1024, 1920), y=float16(16, 1024, 80), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
[2023-05-24 16:44:20,694] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT _minimum_gemm_alignment /home/alpha/.local/lib/python3.11/site-packages/xformers/ops/fmha/cutlass.py line 39
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <function get_device_capability at 0x7f52bb97e840>


Compiling cpu task cast(x=float32(16, 1024, 80), y=float16(16, 1024, 80))...
Compiling cuda task fused(b=float16(2, 640, 640), y=float16(640,), x=float16(16, 1024, 80), y=float16(2, 1024, 640), fused_ops='reshape rearrange reshape batch_matmul add divs', anchor='batch_matmul')...
Compiling cuda task batch_matmul(a=float16(2, 77, 768), b=float16(2, 768, 1280), c=float16(2, 77, 1280), batch_size=2, m_size=77, n_size=1280, k_size=768, mma='mma')...
Compiling cuda task fused(data=float16(2, 77, 1280), y=float16(16, 77, 80), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task fused(data=float16(2, 77, 1280), y=float16(16, 77, 80), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task fused(a=float16(2, 1024, 640), b=float16(2, 640, 640), y=float16(16, 1024, 80), fused_ops='batch_matmul reshape rearrange reshape', anchor='batch_matmul')...
Compiling cuda task fused(a=float16(2, 1024, 640), b=float16(2, 640, 5120), y=float16(5120,), z=float16(2, 1024, 5120), fused_ops='batch_matmul add', anchor='batch_matmul')...
Compiling cuda task fused(b=float16(2, 2560, 640), y=float16(640,), data=float16(2, 1024, 5120), z=float16(2, 1024, 640), fused_ops='slice slice gelu mul batch_matmul add', anchor='batch_matmul')...
Compiling cpu task cast(x=float32(2, 256, 1280), y=float16(2, 256, 1280))...
Compiling cuda task fused(data=float16(2, 256, 3840), y=float16(16, 256, 160), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task batch_matmul(a=float16(2, 256, 1280), b=float16(2, 1280, 3840), c=float16(2, 256, 3840), batch_size=2, m_size=256, n_size=3840, k_size=1280, mma='mma')...
Compiling cuda task fused(data=float16(2, 256, 3840), y=float16(16, 256, 160), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task fused(data=float16(2, 256, 3840), y=float16(16, 256, 160), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cpu task cast(x=float32(16, 256, 160), y=float16(16, 256, 160))...
Compiling cuda task fused(b=float16(2, 1280, 1280), y=float16(1280,), x=float16(16, 256, 160), y=float16(2, 256, 1280), fused_ops='reshape rearrange reshape batch_matmul add divs', anchor='batch_matmul')...
Compiling cuda task fused(a=float16(2, 256, 1280), b=float16(2, 1280, 1280), y=float16(16, 256, 160), fused_ops='batch_matmul reshape rearrange reshape', anchor='batch_matmul')...
Compiling cuda task batch_matmul(a=float16(2, 77, 768), b=float16(2, 768, 2560), c=float16(2, 77, 2560), batch_size=2, m_size=77, n_size=2560, k_size=768, mma='mma')...
Compiling cuda task fused(data=float16(2, 77, 2560), y=float16(16, 77, 160), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task fused(data=float16(2, 77, 2560), y=float16(16, 77, 160), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task rearrange(x=float16(10240, 1280), y=float16(1280, 10240))...
Compiling cuda task rearrange(x=float16(1280, 5120), y=float16(5120, 1280))...
Compiling cuda task broadcast(data=float16(1280, 10240), out=float16(2, 1280, 10240))...
Compiling cuda task broadcast(data=float16(5120, 1280), out=float16(2, 5120, 1280))...
Compiling cuda task fused(a=float16(2, 256, 1280), b=float16(2, 1280, 10240), y=float16(10240,), z=float16(2, 256, 10240), fused_ops='batch_matmul add', anchor='batch_matmul')...
Compiling cuda task fused(b=float16(2, 5120, 1280), y=float16(1280,), data=float16(2, 256, 10240), z=float16(2, 256, 1280), fused_ops='slice slice gelu mul batch_matmul add', anchor='batch_matmul')...
[2023-05-24 16:45:36,134] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 993
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 281, in _raise_exception
    raise type(exception)(
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
ValueError: from_dlpack: got tensor with shape [1280, 1280, 3, 3] and strides [11520, 1, 3840, 1280]. Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet., occurred when interpreting L__self___resnets_0_conv1 with
  HidetConv2d(tensor(...))
HidetConv2d is defined at
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/register_modules.py", line 52


[2023-05-24 16:45:36,308] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 560
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


[2023-05-24 16:45:36,319] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT __init__ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/container.py line 276
due to:
Traceback (most recent call last):
  File "/home/alpha/.local/lib/python3.11/site-packages/torch/_dynamo/exc.py", line 134, in unimplemented
    raise Unsupported(msg)
torch._dynamo.exc.Unsupported: Guard setup for uninitialized class <class 'torch.nn.modules.container.ModuleList'>

Set torch._dynamo.config.verbose=True or TORCHDYNAMO_VERBOSE=1 for more information


Compiling cpu task cast(x=float32(2, 64, 1280), y=float16(2, 64, 1280))...
Compiling cuda task fused(data=float16(2, 64, 3840), y=float16(16, 64, 160), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task batch_matmul(a=float16(2, 64, 1280), b=float16(2, 1280, 3840), c=float16(2, 64, 3840), batch_size=2, m_size=64, n_size=3840, k_size=1280, mma='mma')...
Compiling cuda task fused(data=float16(2, 64, 3840), y=float16(16, 64, 160), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task fused(data=float16(2, 64, 3840), y=float16(16, 64, 160), fused_ops='slice reshape rearrange reshape', anchor='reshape')...
Compiling cuda task rearrange(x=float16(2, 2, 640, 1280), y=float16(4, 640, 1280))...
Compiling cpu task cast(x=float32(16, 64, 160), y=float16(16, 64, 160))...
Compiling cuda task fused(x=float16(2, 64, 1280), y=float16(1280,), y=float16(2, 64, 1280), fused_ops='add divs', anchor='divs')...
Compiling cuda task reduce_sum_f16(x=float16(2, 2, 64, 1280), y=float16(2, 64, 1280), dims=[1], keep_dim=False, reduce_type=sum, accumulate_dtype='float32')...
Compiling cuda task fused(b=float16(4, 640, 1280), x=float16(16, 64, 160), y=float16(2, 2, 64, 1280), fused_ops='reshape rearrange reshape reshape rearrange batch_matmul reshape', anchor='batch_matmul')...
Compiling cuda task fused(x=float16(2, 64, 1280), y=float16(16, 64, 160), fused_ops='reshape rearrange reshape', anchor='reshape')...
Compiling cuda task fused(b=float16(4, 640, 1280), x=float16(2, 64, 1280), y=float16(2, 2, 64, 1280), fused_ops='reshape rearrange batch_matmul reshape', anchor='batch_matmul')...
Compiling cuda task rearrange(x=float16(2, 2, 2560, 1280), y=float16(4, 2560, 1280))...
Compiling cuda task add(x=float16(2, 64, 1280), y=float16(1280,), z=float16(2, 64, 1280))...
Compiling cuda task fused(a=float16(2, 64, 1280), b=float16(2, 1280, 10240), y=float16(10240,), z=float16(2, 64, 10240), fused_ops='batch_matmul add', anchor='batch_matmul')...
Compiling cuda task fused(b=float16(4, 2560, 1280), data=float16(2, 64, 10240), y=float16(2, 2, 64, 1280), fused_ops='slice slice gelu mul reshape rearrange batch_matmul reshape', anchor='batch_matmul')...
[2023-05-24 16:46:15,915] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 2078
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 281, in _raise_exception
    raise type(exception)(
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
ValueError: from_dlpack: got tensor with shape [1280, 2560, 3, 3] and strides [23040, 1, 7680, 2560]. Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet., occurred when interpreting L__self___resnets_0_conv1 with
  HidetConv2d(tensor(...))
HidetConv2d is defined at
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/register_modules.py", line 52


[2023-05-24 16:46:15,944] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/resnet.py line 135
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 281, in _raise_exception
    raise type(exception)(
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
ValueError: from_dlpack: got tensor with shape [1280, 1280, 3, 3] and strides [11520, 1, 3840, 1280]. Only compact tensors are supported for hidet, please consider make it continuous before passing it to hidet., occurred when interpreting L__self___conv with
  HidetConv2d(tensor(...))
HidetConv2d is defined at
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/register_modules.py", line 52


[2023-05-24 16:46:16,376] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 1966
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention.BasicTransformerBlock'>


/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/dynamo_backends.py:67: UserWarning: Hidet received a non-contiguous torch input tensor, converting it to contiguous
  warnings.warn_once('Hidet received a non-contiguous torch input tensor, converting it to contiguous')
100%|████████████████████████████████████████████████████████████████████| 50/50 [03:28<00:00,  4.17s/it]
100%|████████████████████████████████████████████████████████████████████| 50/50 [00:10<00:00,  4.85it/s]
100%|████████████████████████████████████████████████████████████████████| 50/50 [00:10<00:00,  4.81it/s]

Answer 26 · 2023-06-07T16:01:30.000Z

Hi @AlphaAtlas

For diffusers.models.attention.BasicTransformerBlock, this is not a hidet bug, but rather due to diffusers annotating the graph with external modules.
you can use this line to circumvent it.

torch._dynamo.disallow_in_graph(diffusers.models.attention.BasicTransformerBlock)

For the contiguous tensor part, I still was not able to reproduce it, perhaps you can make sure by explicitly setting

model = model.to(memory_format=torch.contiguous_format)

Answer 27 · 2023-06-07T17:28:16.000Z

In the case of a diffusers pipeline, do you mean pipe.unet = pipe.unet.to(memory_format=torch.contiguous_format)?

Answer 28 · 2023-06-07T17:53:23.000Z

In the case of a diffusers pipeline, do you mean pipe.unet = pipe.unet.to(memory_format=torch.contiguous_format)?

Yes

Answer 29 · 2023-06-07T18:05:48.000Z

I am doing some more testing locally, and am still getting a continguous memory error:

Long log, 2nd run after a local compile cache

~/AI/difftest
venv ❯ python hidet_debug.py
100%|████████████████████████████████████████████████████████████████████| 50/50 [00:09<00:00,  5.46it/s]
### torch.compile warmup:
  0%|                                                                             | 0/50 [00:00<?, ?it/s][2023-06-07 13:54:57,418] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_condition.py line 636
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention_processor.Attention'>


/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/dynamo_backends.py:62: UserWarning: Hidet received a non-contiguous torch input tensor, converting it to contiguous
  warnings.warn_once('Hidet received a non-contiguous torch input tensor, converting it to contiguous')
[2023-06-07 13:54:59,742] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 881
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention_processor.Attention'>


[2023-06-07 13:55:01,333] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/transformer_2d.py line 214
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention_processor.Attention'>


[2023-06-07 13:55:01,414] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/attention.py line 122
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention_processor.Attention'>


Compiling cuda task attn(q=float16(2, 8, 1024, 80), k=float16(2, 8, 80, 1024), v=float16(2, 8, 1024, 80), o=float16(2, 8, 1024, 80), is_causal=False)...
[2023-06-07 13:55:13,290] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/attention_processor.py line 316
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/drivers/build_task.py", line 272, in build_task_batch
    raise RuntimeError('\n'.join(msg))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
RuntimeError: Failed to build 1 tasks:
  [cuda] attn(q=float16(2, 8, 1024, 80), k=float16(2, 8, 80, 1024), v=float16(2, 8, 1024, 80), o=float16(2, 8, 1024, 80), is_causal=False)

    Traceback (most recent call last):
      File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/drivers/build_task.py", line 248, in build_job
        build_task(task, target, load=False)
      File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/drivers/build_task.py", line 233, in build_task
        build_task_module(task, candidates, task_dir, target)
      File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/drivers/build_task.py", line 61, in build_task_module
        raise ValueError('No candidate found.')
    ValueError: No candidate found.



Compiling cuda task attn(q=float16(2, 8, 1024, 80), k=float16(2, 8, 80, 1024), v=float16(2, 8, 1024, 80), o=float16(2, 8, 1024, 80), is_causal=False)...
[2023-06-07 13:55:13,848] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT __call__ /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/attention_processor.py line 1076
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/drivers/build_task.py", line 272, in build_task_batch
    raise RuntimeError('\n'.join(msg))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
RuntimeError: Failed to build 1 tasks:
  [cuda] attn(q=float16(2, 8, 1024, 80), k=float16(2, 8, 80, 1024), v=float16(2, 8, 1024, 80), o=float16(2, 8, 1024, 80), is_causal=False)

    Traceback (most recent call last):
      File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/drivers/build_task.py", line 248, in build_job
        build_task(task, target, load=False)
      File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/drivers/build_task.py", line 233, in build_task
        build_task_module(task, candidates, task_dir, target)
      File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/drivers/build_task.py", line 61, in build_task_module
        raise ValueError('No candidate found.')
    ValueError: No candidate found.



[2023-06-07 13:55:27,258] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 563
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention_processor.Attention'>


[2023-06-07 13:55:28,805] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT __init__ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/container.py line 276
due to:
Traceback (most recent call last):
  File "/home/alpha/.local/lib/python3.11/site-packages/torch/_dynamo/exc.py", line 134, in unimplemented
    raise Unsupported(msg)
torch._dynamo.exc.Unsupported: Guard setup for uninitialized class <class 'torch.nn.modules.container.ModuleList'>

Set torch._dynamo.config.verbose=True or TORCHDYNAMO_VERBOSE=1 for more information


[2023-06-07 13:55:35,377] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT forward /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py line 1969
due to:
Traceback (most recent call last):
  File "/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/interpreter.py", line 220, in _check_support
    raise NotImplementedError("\n".join(lines))
torch._dynamo.exc.BackendCompilerFailed: backend='hidet' raised:
NotImplementedError: The following modules/functions are not supported by hidet yet:
  <class 'diffusers.models.attention_processor.Attention'>


/home/alpha/AI/difftest/venv/lib/python3.11/site-packages/hidet/graph/frontend/torch/dynamo_backends.py:62: UserWarning: Hidet received a non-contiguous torch input tensor, converting it to contiguous
  warnings.warn_once('Hidet received a non-contiguous torch input tensor, converting it to contiguous')
100%|████████████████████████████████████████████████████████████████████| 50/50 [01:24<00:00,  1.69s/it]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/alpha/AI/difftest/hidet_debug.py:45 in <module>                                            │
│                                                                                                  │
│   42 pipe.unet = torch.compile(pipe.unet, backend="hidet")                                       │
│   43                                                                                             │
│   44 print("### torch.compile warmup:")                                                          │
│ ❱ 45 image = pipe("a photo of an astronaut riding a horse on mars").images[0]                    │
│   46                                                                                             │
│   47 print("torch.compile benchmark:")                                                           │
│   48 image = pipe("a photo of an astronaut riding a horse on mars").images[0]                    │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py:115 in                │
│ decorate_context                                                                                 │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/pipelines/stable_diffusion/p │
│ ipeline_stable_diffusion.py:755 in __call__                                                      │
│                                                                                                  │
│   752 │   │   │   │   │   │   callback(i, t, latents)                                            │
│   753 │   │                                                                                      │
│   754 │   │   if not output_type == "latent":                                                    │
│ ❱ 755 │   │   │   image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dic   │
│   756 │   │   │   image, has_nsfw_concept = self.run_safety_checker(image, device, prompt_embe   │
│   757 │   │   else:                                                                              │
│   758 │   │   │   image = latents                                                                │
│                                                                                                  │
│ /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py:46 │
│ in wrapper                                                                                       │
│                                                                                                  │
│   43 │   def wrapper(self, *args, **kwargs):                                                     │
│   44 │   │   if hasattr(self, "_hf_hook") and hasattr(self._hf_hook, "pre_forward"):             │
│   45 │   │   │   self._hf_hook.pre_forward(self)                                                 │
│ ❱ 46 │   │   return method(self, *args, **kwargs)                                                │
│   47 │                                                                                           │
│   48 │   return wrapper                                                                          │
│   49                                                                                             │
│                                                                                                  │
│ /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/autoencoder_kl.py:191 │
│ in decode                                                                                        │
│                                                                                                  │
│   188 │   │   │   decoded_slices = [self._decode(z_slice).sample for z_slice in z.split(1)]      │
│   189 │   │   │   decoded = torch.cat(decoded_slices)                                            │
│   190 │   │   else:                                                                              │
│ ❱ 191 │   │   │   decoded = self._decode(z).sample                                               │
│   192 │   │                                                                                      │
│   193 │   │   if not return_dict:                                                                │
│   194 │   │   │   return (decoded,)                                                              │
│                                                                                                  │
│ /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/autoencoder_kl.py:178 │
│ in _decode                                                                                       │
│                                                                                                  │
│   175 │   │   │   return self.tiled_decode(z, return_dict=return_dict)                           │
│   176 │   │                                                                                      │
│   177 │   │   z = self.post_quant_conv(z)                                                        │
│ ❱ 178 │   │   dec = self.decoder(z)                                                              │
│   179 │   │                                                                                      │
│   180 │   │   if not return_dict:                                                                │
│   181 │   │   │   return (dec,)                                                                  │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/module.py:1502 in               │
│ _wrapped_call_impl                                                                               │
│                                                                                                  │
│   1499 │   │   if self._compiled_call_impl is not None:                                          │
│   1500 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]        │
│   1501 │   │   else:                                                                             │
│ ❱ 1502 │   │   │   return self._call_impl(*args, **kwargs)                                       │
│   1503 │                                                                                         │
│   1504 │   def _call_impl(self, *args, **kwargs):                                                │
│   1505 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo  │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/module.py:1511 in _call_impl    │
│                                                                                                  │
│   1508 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1509 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1510 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1511 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1512 │   │   # Do not call functions when jit is used                                          │
│   1513 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1514 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/vae.py:270 in forward │
│                                                                                                  │
│   267 │   │   │                                                                                  │
│   268 │   │   │   # up                                                                           │
│   269 │   │   │   for up_block in self.up_blocks:                                                │
│ ❱ 270 │   │   │   │   sample = up_block(sample, latent_embeds)                                   │
│   271 │   │                                                                                      │
│   272 │   │   # post-process                                                                     │
│   273 │   │   if latent_embeds is None:                                                          │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/module.py:1502 in               │
│ _wrapped_call_impl                                                                               │
│                                                                                                  │
│   1499 │   │   if self._compiled_call_impl is not None:                                          │
│   1500 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]        │
│   1501 │   │   else:                                                                             │
│ ❱ 1502 │   │   │   return self._call_impl(*args, **kwargs)                                       │
│   1503 │                                                                                         │
│   1504 │   def _call_impl(self, *args, **kwargs):                                                │
│   1505 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo  │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/module.py:1511 in _call_impl    │
│                                                                                                  │
│   1508 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1509 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1510 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1511 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1512 │   │   # Do not call functions when jit is used                                          │
│   1513 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1514 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py:216 │
│ 4 in forward                                                                                     │
│                                                                                                  │
│   2161 │   │                                                                                     │
│   2162 │   │   if self.upsamplers is not None:                                                   │
│   2163 │   │   │   for upsampler in self.upsamplers:                                             │
│ ❱ 2164 │   │   │   │   hidden_states = upsampler(hidden_states)                                  │
│   2165 │   │                                                                                     │
│   2166 │   │   return hidden_states                                                              │
│   2167                                                                                           │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/module.py:1502 in               │
│ _wrapped_call_impl                                                                               │
│                                                                                                  │
│   1499 │   │   if self._compiled_call_impl is not None:                                          │
│   1500 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]        │
│   1501 │   │   else:                                                                             │
│ ❱ 1502 │   │   │   return self._call_impl(*args, **kwargs)                                       │
│   1503 │                                                                                         │
│   1504 │   def _call_impl(self, *args, **kwargs):                                                │
│   1505 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo  │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/module.py:1511 in _call_impl    │
│                                                                                                  │
│   1508 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1509 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1510 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1511 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1512 │   │   # Do not call functions when jit is used                                          │
│   1513 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1514 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /home/alpha/AI/difftest/venv/lib/python3.11/site-packages/diffusers/models/resnet.py:168 in      │
│ forward                                                                                          │
│                                                                                                  │
│   165 │   │   # TODO(Suraj, Patrick) - clean up after weight dicts are correctly renamed         │
│   166 │   │   if self.use_conv:                                                                  │
│   167 │   │   │   if self.name == "conv":                                                        │
│ ❱ 168 │   │   │   │   hidden_states = self.conv(hidden_states)                                   │
│   169 │   │   │   else:                                                                          │
│   170 │   │   │   │   hidden_states = self.Conv2d_0(hidden_states)                               │
│   171                                                                                            │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/module.py:1502 in               │
│ _wrapped_call_impl                                                                               │
│                                                                                                  │
│   1499 │   │   if self._compiled_call_impl is not None:                                          │
│   1500 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]        │
│   1501 │   │   else:                                                                             │
│ ❱ 1502 │   │   │   return self._call_impl(*args, **kwargs)                                       │
│   1503 │                                                                                         │
│   1504 │   def _call_impl(self, *args, **kwargs):                                                │
│   1505 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo  │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/module.py:1511 in _call_impl    │
│                                                                                                  │
│   1508 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1509 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1510 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1511 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1512 │   │   # Do not call functions when jit is used                                          │
│   1513 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1514 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/conv.py:463 in forward          │
│                                                                                                  │
│    460 │   │   │   │   │   │   self.padding, self.dilation, self.groups)                         │
│    461 │                                                                                         │
│    462 │   def forward(self, input: Tensor) -> Tensor:                                           │
│ ❱  463 │   │   return self._conv_forward(input, self.weight, self.bias)                          │
│    464                                                                                           │
│    465 class Conv3d(_ConvNd):                                                                    │
│    466 │   __doc__ = r"""Applies a 3D convolution over an input signal composed of several inpu  │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.11/site-packages/torch/nn/modules/conv.py:459 in _conv_forward    │
│                                                                                                  │
│    456 │   │   │   return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=sel  │
│    457 │   │   │   │   │   │   │   weight, bias, self.stride,                                    │
│    458 │   │   │   │   │   │   │   _pair(0), self.dilation, self.groups)                         │
│ ❱  459 │   │   return F.conv2d(input, weight, bias, self.stride,                                 │
│    460 │   │   │   │   │   │   self.padding, self.dilation, self.groups)                         │
│    461 │                                                                                         │
│    462 │   def forward(self, input: Tensor) -> Tensor:                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OutOfMemoryError: Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated     : 2.21 GiB
Requested               : 128.00 MiB
Device limit            : 5.79 GiB
Free (according to CUDA): 16.00 MiB
PyTorch limit (set by user-supplied memory fraction)
                        : 17179869184.00 GiB

But I found the OOM at the end particularly interesting:

OutOfMemoryError: Allocation on device 0 would exceed allowed memory. (out of memory)
Currently allocated     : 2.21 GiB
Requested               : 128.00 MiB
Device limit            : 5.79 GiB
Free (according to CUDA): 16.00 MiB
PyTorch limit (set by user-supplied memory fraction)
                        : 17179869184.00 GiB

OOM with 2.21GB allocated? That is some memory fragmentation there 👀. The GPU is otherwise empty, its not even rendering any display output.

Anyway I am going to see if I can replicate my issues in a colab notebook, so they are easily accessible to yall.

Answer 30 · 2023-06-07T18:16:41.000Z

Here is the Colab notebook that reproduces the non-contiguous tensor error on a T4, even with pipe.unet = pipe.unet.to(memory_format=torch.contiguous_format)

https://github.com/AlphaAtlas/Diffusion-Testing/blob/main/Hidet_Debug_Repro.ipynb

Answer 31 · 2023-06-07T18:24:56.000Z

It seems that the error below is gone.

ValueError: from_dlpack: got tensor with shape [320, 320, 3, 3] and strides [2880, 1, 960, 320]. Only compact tensors are supported for hidet,

There is just one extra Traceback in the log, which should be solved by this, for the same reason as before.

torch._dynamo.disallow_in_graph(diffusers.models.attention_processor.Attention)

The memory error may be related to #217, I can look into this

Answer 32 · 2023-08-28T14:27:24.000Z

Closing this issue as the original compilation errors are now resolved

Compilation for T4: Fixed by #227
Scaled dot product attention mapping: fixed by #238
channels last and custome modules from fx graph are not supported and can be disabled from dynamo using: pipe.unet = pipe.unet.to(memory_format=torch.contiguous_format) and torch._dynamo.disallow_in_graph(diffusers.models.attention_processor.Attention)