What is the 1024p image checkpoint? Is it a text-to-image model, or is it an image-to-video model?

Question

What is the 1024p image checkpoint? Is it a text-to-image model, or is it an image-to-video model?

Closed this issue 2 months ago · 3 comments

Answer 1 · 2024-10-30T16:18:14.000Z

It is a text-to-image model. Please check image_generation_demo.ipynb for its usage.

Answer 2 · 2024-10-31T05:44:31.000Z

You can still generate higher-resolution videos using the 384p image-to-video model. For example, if you input a 1024x576 resolution image, the output video will maintain that same resolution.

To enable this, you can modify the following code here:
image.resize((width, height))

Answer 3 · 2024-10-31T05:49:55.000Z

You can still generate higher-resolution videos using the 384p image-to-video model. For example, if you input a 1024x576 resolution image, the output video will maintain that same resolution.

To enable this, you can modify the following code here: image.resize((width, height))

Great observation, that would be relying entirely on RoPE's extrapolation capabilities. We also found it on images, but haven't tested it for video generation.