wenqsun/DimensionX

orbit lora gives garbled video

Closed this issue · 4 comments

I'm running this in a Google Colab A100.
Slightly modified requirements:

!pip install diffusers>=0.31.0
!pip install accelerate>=1.0.1
!pip install transformers>=4.46.1
# !pip install numpy==1.26.0
!pip install torch>=2.5.0
!pip install torchvision>=0.20.0
!pip install sentencepiece>=0.2.0
!pip install SwissArmyTransformer>=0.4.12
!pip install gradio>=5.4.0
!pip install imageio>=2.35.1
!pip install imageio-ffmpeg>=0.5.1
!pip install openai>=1.53.0
!pip install moviepy>=1.0.3
!pip install scikit-video>=1.1.11

For the model to run with a lora under 40Gb VRAM, I have to have at least one of

pipe.vae.enable_tiling()
pipe.vae.enable_slicing()
pipe.enable_sequential_cpu_offload()

If I don't load the lora, then a simple zoom is produced, which looks fine.

Here's the pipeline, after loading the orbit left lora from orbit left

pipe = CogVideoXImageToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-I2V", torch_dtype=torch.bfloat16)
lora_path = str(INSTALL_PATH / "orbit_left_lora_weights.safetensors")
pipe.load_lora_weights(lora_path, weight_name="pytorch_lora_weights.safetensors", adapter_name="test_1")
pipe.to("cuda")

then to run as in the demo.

prompt = "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"
)
video = pipe(image, prompt, use_dynamic_cfg=True)
export_to_video(video.frames[0], str(movie_path/"tvideo_test.mp4"), fps=8)

but the resulting video is just noisy blocks, like not enough steps have been run.

You need to set lora_scale=1 / lora_rank. Specifically, you should modify you code like:

lora_rank = 256
pipe.load_lora_weights(lora_path, weight_name="pytorch_lora_weights.safetensors", adapter_name="test_1")
pipe.fuse_lora(lora_scale=1 / lora_rank)

When I change to that, I get:

Loading pipeline components...: 100%
 5/5 [02:38<00:00, 34.36s/it]
Loading checkpoint shards: 100%
 2/2 [01:00<00:00, 29.89s/it]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-4-ca26d16f6681>](https://localhost:8080/#) in <cell line: 20>()
     18 lora_rank = 256
     19 pipe.load_lora_weights(lora_path, weight_name="pytorch_lora_weights.safetensors", adapter_name="test_1")
---> 20 pipe.fuse_lora(lora_scale=1 / lora_rank) 
     21 
     22 pipe.to("cuda")

1 frames
[/usr/local/lib/python3.10/dist-packages/diffusers/loaders/lora_base.py](https://localhost:8080/#) in fuse_lora(self, components, lora_scale, safe_fusing, adapter_names, **kwargs)
    443         for fuse_component in components:
    444             if fuse_component not in self._lora_loadable_modules:
--> 445                 raise ValueError(f"{fuse_component} is not found in {self._lora_loadable_modules=}.")
    446 
    447             model = getattr(self, fuse_component, None)

ValueError: text_encoder is not found in self._lora_loadable_modules=['transformer'].

You can take a look this issue.

Got it. For those working in colab, add this line after installing packages:

!sed -z "s/for fuse_component in components:/for fuse_component in components:\n            if fuse_component == 'text_encoder':\n                continue\n/g" -i /usr/local/lib/python3.10/dist-packages/diffusers/loaders/lora_base.py