jy0205/Pyramid-Flow

What is the 1024p image checkpoint? Is it a text-to-image model, or is it an image-to-video model?

Closed this issue · 3 comments

What is the 1024p image checkpoint? Is it a text-to-image model, or is it an image-to-video model?

It is a text-to-image model. Please check image_generation_demo.ipynb for its usage.

You can still generate higher-resolution videos using the 384p image-to-video model. For example, if you input a 1024x576 resolution image, the output video will maintain that same resolution.

To enable this, you can modify the following code here:
image.resize((width, height))

You can still generate higher-resolution videos using the 384p image-to-video model. For example, if you input a 1024x576 resolution image, the output video will maintain that same resolution.

To enable this, you can modify the following code here: image.resize((width, height))

Great observation, that would be relying entirely on RoPE's extrapolation capabilities. We also found it on images, but haven't tested it for video generation.