How to run CogVideoX1.5-5B-I2V as int8?

Question

How to run CogVideoX1.5-5B-I2V as int8?

Opened this issue 22 days ago · 3 comments

I am running it like below and still using 22 GB VRAM and very slow on RTX 3090

What I am doing wrong?



import torch
from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image
from transformers import T5EncoderModel
from torchao.quantization import quantize_, int8_weight_only

quantization = int8_weight_only

text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX1.5-5B-I2V", subfolder="text_encoder",
                                              torch_dtype=torch.bfloat16)
quantize_(text_encoder, quantization())

transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX1.5-5B-I2V", subfolder="transformer",
                                                          torch_dtype=torch.bfloat16)
quantize_(transformer, quantization())

vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX1.5-5B-I2V", subfolder="vae", torch_dtype=torch.bfloat16)
quantize_(vae, quantization())

# Create pipeline and run inference
pipe = CogVideoXImageToVideoPipeline.from_pretrained(
    "THUDM/CogVideoX1.5-5B-I2V",
    text_encoder=text_encoder,
    transformer=transformer,
    vae=vae,
    torch_dtype=torch.bfloat16,
)

pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
pipe.vae.enable_slicing()

prompt = "a fast car"
image = load_image(image="input.png")
video = pipe(
    prompt=prompt,
    image=image,
    num_videos_per_prompt=1,
    num_inference_steps=50,
    num_frames=24,
    guidance_scale=6,
    generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

export_to_video(video, "output.mp4", fps=8)

Answer 1 · 2024-12-12T00:03:01.000Z

running this way made it use 8 gb . do i set the video fps in the pipe?

prompt = "a fast car"
image = load_image(image="input.png")
video = pipe(
    prompt=prompt,
    height=480,
    width=720,
    image=image,
    num_videos_per_prompt=1,
    num_inference_steps=50,
    num_frames=12,
    guidance_scale=6,
    generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

Answer 2 · 2024-12-13T14:51:54.000Z

Necessary, otherwise the default is 49, which is 8 * 6 + 1 frame(not for CogVideoX1.5-5B), please adjust and run each parameter according to cli_demo. Thank you.

Answer 3 · 2024-12-13T15:08:55.000Z

@zRzRzRzRzRzRzR amazing work i made it work for huge speed and low vram on windows

however do we need to prompt is a mystery can you guide me?

here example

i used this prompt : A highly detailed, majestic dragon, with shimmering orange and white scales, slowly turns its head to gaze intently with a piercing golden eye, as glowing embers drift softly in the air around it, creating a magical, slightly mysterious atmosphere in a blurred forest background.

video_0004.mp4