jy0205/Pyramid-Flow

Setting seed values, creating ending variations, my experiments.

Opened this issue · 1 comments

Be good to have an ability to set seeds.

I've added this to my version. My code is too hacky and special purpose to share directly, but I thought it might be worth mentioning in the hopes that future code changes don't mess up the ability to set the seeds.

In particular, I have a method to change the seed as the the video progresses. This is handy because you can have a different seed at the beginning versus the end. It lets you get a good start to the video and then experiment with different endings.

For those interested I'll explain what you need to do, it's a bit of a long post. :)


My basic method is:
Start with an input image.
Generate 4 different 1 second videos for that image, each with a different seed.
Take the best one and remember its seed.
Now for that video, generate 4 different 2 second videos. The first second uses the original seed, the next second uses different seeds for each version.
Now you can choose the best 2 second video. And this procedure can be continued for 3 seconds, 4 seconds etc.

In my code I initially add the seed values as globals to the top of app.py

# Set the seed for both PyTorch and NumPy
seed_value = 1000
random.seed(seed_value)
np.random.seed(seed_value)
torch.manual_seed(seed_value)
torch.cuda.manual_seed(seed_value)
torch.cuda.manual_seed_all(seed_value)
torch.backends.cudnn.deterministic = True # Ensure reproducibility on GPU (if CUDA is being used)
torch.backends.cudnn.benchmark = False

This won't be enough though, because other functions in the project also change the seeds, so some other stuff is needed.

I've found the best place to set the initial seed is in generate_image_to_video(). I've made several changes to that function, so my new header has some extra parameters, including up to 4 seeds.

def generate_image_to_video(image, prompt, temp, video_guidance_scale, resolution, progress=gr.Progress(), max_segment=0, sub_duration=0, subvid=0, seed_value0=None, seed_value1=None, seed_value2=None, seed_value3=None):

Then when the model gets set, I also set the seeds. The model functions seem to alter the seeds so I set them twice, before and after, to be sure. (As I said, this is hacky and experimental.) I'll explain about sub_duration later, it's important.

# Initialize model based on user options using cached function
try:
    if sub_duration>0: 
        random.seed(seed_value0)
        np.random.seed(seed_value0)
        torch.manual_seed(seed_value0)
        torch.cuda.manual_seed(seed_value0)
        torch.cuda.manual_seed_all(seed_value0)
        torch.backends.cudnn.deterministic = True # Ensure reproducibility on GPU (if CUDA is being used)
        torch.backends.cudnn.benchmark = False
    model, torch_dtype_selected = initialize_model_cached(variant)
    #set the seeds for reproducability.
    # Set the seed for both PyTorch and NumPy
    if sub_duration>0: 
        random.seed(seed_value0)
        np.random.seed(seed_value0)
        torch.manual_seed(seed_value0)
        torch.cuda.manual_seed(seed_value0)
        torch.cuda.manual_seed_all(seed_value0)
        torch.backends.cudnn.deterministic = True # Ensure reproducibility on GPU (if CUDA is being used)
        torch.backends.cudnn.benchmark = False

Now when the model gets called to do the generating, we pass in sub_duration and the seeds.

    if sub_duration>0:
        with torch.no_grad(), torch.autocast('cuda', dtype=torch_dtype_selected):
            if subvid==0:
                model.reset_cache() # for the text embedding
                
            frames = model.generate_i2v(
                prompt=prompt,
                input_image=image,
                num_inference_steps=[10, 10, 10],
                temp=16, # set to maximum duration, but we'll only do part of it using sub_duration
                video_guidance_scale=video_guidance_scale,
                output_type="pil",
                cpu_offloading=cpu_offloading,
                save_memory=True,
                callback=progress_callback,
                sub_duration=sub_duration,
                seed_value0=seed_value0,
                seed_value1=seed_value1,
                seed_value2=seed_value2,
                seed_value3=seed_value3,
            )

So now I'll explain about sub_duration. I found that if you try and generate a 1 second movie, and then use the same seed to generate a 2 second movie, it WON'T work. Changing the duration slider alters the way the seed effects the render.

Instead, what I do is set the duration always to the maximum (16 for me, I can only do 368p video) but I get out earlier using sub_duration. So if I want a video with duration 7, my input to model.generate_i2v tells it to use duration 16 BUT with sub_duration 7 I'll be getting out early. This seems to work and renders in the time of duration 7.

So now model.generate_i2v will generate the video, and the random seeds will be what I set according to seed_value0.

But as I said, we can now change the seed on the fly, and generate different endings. If you go down the code in generate_i2v, you'll find the line

  for unit_index in tqdm(range(1, num_units)):

I changed it to the following, so that I can change the seed at various points (roughly each second at 24 or 25 fps.)

   if sub_duration==0:
        num_units_alt=num_units
    else:
        num_units_alt=sub_duration
    print(f"[INFO] num_units: {num_units}  num_units_alt: {num_units_alt}")
    
    unit_count=0
    for unit_index in tqdm(range(1, num_units_alt)): # num_units_alt is either the original duration or sub_duration
        gc.collect()
        torch.cuda.empty_cache()
        
        # experimental later seeds.
        # lets us generate different endings for earlier video
        if unit_index==4 and seed_value1!=None:
            random.seed(seed_value1)
            np.random.seed(seed_value1)
            torch.manual_seed(seed_value1)
            torch.cuda.manual_seed(seed_value1)
            torch.cuda.manual_seed_all(seed_value1)
        elif unit_index==7 and seed_value2!=None:
            random.seed(seed_value2)
            np.random.seed(seed_value2)
            torch.manual_seed(seed_value2)
            torch.cuda.manual_seed(seed_value2)
            torch.cuda.manual_seed_all(seed_value2)
        elif unit_index==10 and seed_value3!=None:
            random.seed(seed_value3)
            np.random.seed(seed_value3)
            torch.manual_seed(seed_value3)
            torch.cuda.manual_seed(seed_value3)
            torch.cuda.manual_seed_all(seed_value3)

Now the loop will end at the real duration you want (sub_duration) rather than 16, and it'll change the seed at different points so you can use the same beginning video and get different ending video.

One last thing, I also change the code in extract_text_features.py, where the random numbers were altered. I don't know if I needed to do this but did it just to be safe.

def main():
    args = get_args()

    init_distributed_mode(args)

    # fix the seed for reproducibility # removed this because I set all random seeds in app.py
    #seed = 42
    #torch.manual_seed(seed)
    #np.random.seed(seed)
    #random.seed(seed)

My own code is set up to run batches on folders of images and associated mp4 files and their seeds, but like I said it's way too hacky and hardwired for a special case to post here. I generate lots of variations of 1 second videos with different seeds, take the best ones, then generate lots of 2 second videos with the good first half and different seeds for the second half, to find the best ones.

Anyway, just thought I'd post this to maybe give other people help or ideas, if they would like to control the seeds more and experiment with generating different endings to good first half videos. Hopefully I didn't forget any important details.

Thank you for sharing the code!