aws-neuron/aws-neuron-sdk

[HF][Optimum] Compiling unet in stable diffusion XL pipeline failed since Neuron SDK 2.18

JingyaHuang opened this issue · 8 comments

Hi team, when trying to bump Optimum Neuron to the latest Neuron sdk 2.18 release, we notice that the compilation of unet for SDXL model failed with the latest compiler. Here are more details about the regression:

  • System information
OS unbuntu 20.04.5 LTS
  • Neuron driver
aws-neuronx-collectives/unknown,now 2.20.22.0-c101c322e amd64 [installed]
aws-neuronx-dkms/unknown,now 2.16.7.0 amd64 [installed]
aws-neuronx-runtime-lib/unknown,now 2.20.22.0-1b3ca6425 amd64 [installed]
aws-neuronx-tools/unknown,now 2.17.1.0 amd64 [installed]
  • Pip installed
aws-neuronx-runtime-discovery 2.9
diffusers                     0.27.2
libneuronxla                  0.5.971
neuronx-cc                    2.13.66.0+6dfecc895
numpy                         1.24.4
optimum                       1.18.0
optimum-neuron                0.0.21.dev0
torch                         1.13.1
torch-neuronx                 1.13.1.1.14.0
torch-xla                     1.13.1+torchneurone
torchvision                   0.14.1
transformers                  4.36.2
Error log
=== BIR verification failed ===
Reason: Pattern accesses 48 (> 32) partitions starting at partition 32
Instruction: I-36948
Opcode: GenericCopy
Output index: 0
Argument AP:
Access Pattern: [[1,48],[1,1],[1,1]]
SymbolicAP
Memory Location: {concatenate.3_set}@SB
2024-04-03T09:11:19Z 
2024-04-03T09:11:19Z Diagnostic information:
2024-04-03T09:11:19Z   NeuronX Compiler version 2.13.66.0+6dfecc895
2024-04-03T09:11:19Z   
2024-04-03T09:11:19Z   Python version 3.8.10
2024-04-03T09:11:19Z   HWM version 2.13.66.0+6dfecc895
2024-04-03T09:11:19Z   NumPy version 1.24.4
2024-04-03T09:11:19Z   
2024-04-03T09:11:19Z   Running on AMI ami-09cd747c78a9add63
2024-04-03T09:11:19Z   Running in region use1-az6
2024-04-03T09:11:19Z 
2024-04-03T09:11:19Z Diagnostic logs stored in /home/ubuntu/optimum-neuron/log-neuron-cc.txt
An error occured when trying to trace unet with the error message: neuronx-cc failed with 70.
The export is failed and unet neuron model won't be stored.
An error occured when trying to trace unet with the error message: neuronx-cc failed with 70.
  • Reproduction
from optimum.neuron import NeuronStableDiffusionXLPipeline


# [Export]
model_id = "echarlaix/tiny-random-stable-diffusion-xl"
num_images_per_prompt = 1
input_shapes = {"batch_size": 1, "height": 64, "width": 64, "num_images_per_prompt": num_images_per_prompt}
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}

# Compile and save
stable_diffusion = NeuronStableDiffusionXLPipeline.from_pretrained(
    model_id, export=True, **compiler_args, **input_shapes
)

save_directory = "tiny_sdxl_neuronx/"
stable_diffusion.save_pretrained(save_directory)

The test above works as expected with Neuron SDK 2.17.1.

Also tried with PyTorch 2.1.2 setup, not working neither.

Hi Jingya, I'm trying to reproduce the problem. I installed optimum and optimum-neuron with
pip install "optimum[neuronx, diffusers]"
based on https://huggingface.co/docs/optimum-neuron/tutorials/stable_diffusion.
However this seems to get v0.0.3 which doesn't find NeuronStableDiffusionXLPipeline. I also tried downgrading to 0.0.2 which has another problem. Is this expected with these versions, and is there a way to get 0.021? Thanks.

The installation with neuronx extra is what we are going to fix with the 0.0.21 optimum-neuron release. Fow now, to install the latest optimum-neuron release(0.0.20), could you try with:

pip install optimum==1.18.0
pip install optimum-neuron==0.0.20

Or the 0.0.21 dev version can be installed from source:

pip install git+https://github.com/huggingface/optimum-neuron

Then you could install pip install diffusers.

Thanks Jingya, I updated optimum-neuron and diffusers and now I can reproduce the issue.

Hi Jingya, I found that the issue can be prevented if we set inline_weights_to_neff=True when tracing the UNet. Would that be a sufficient workaround for now? I will also look into the root cause but that may take some time.

Hi @aws-bhegedus, thanks for investigating it!

Optimum Neuron could force setting inline_weights_to_neff=False for sdxl models for now. But given that our caching mechanism relies on the neff weights separation, we won't be able to cache and load sdxl models (which takes time for the compilation).

Thanks Jingya, we will have a fix in a future release to allow enabling the caching.
Is this problem only there for SDXL-random-tiny? Curious about SDXL-base, which I believe is larger and takes longer to compile so may be a bigger problem.

Thanks @aws-bhegedus, that will be awesome!

tiny-random-stable-diffusion-xl is a smaller version (fewer layers) of sdxl models in the pipe with random weights that we built to shorten the testing time, if the compilation fails for the tiny version, it's very unlikely that it could work for the larger pretrained checkpoint. And since the compilation of all sdxl components takes more than an hour, without the caching it could be a little bit discouraging for first-time users.