nivha/single_video_generation

AssertionError in run_generation.py

Closed this issue · 3 comments

Hello, I'm having some problems trying to run the code.

First of all I ran into a device issue in utils/resize_right.py. I had to add a set device in get_field_of_view:

mirror = fw_cat((fw.arange(in_sz), fw.arange(in_sz - 1, -1, step=-1)), fw)
mirror = fw_set_device(mirror, projected_grid.device, fw) # <-- avoids device error in the following line
field_of_view = mirror[fw.remainder(field_of_view, mirror.shape[0])]

Now I'm running into an AssertionError related to the temporal dimension:

Namespace(gpu='0', results_dir='output/', frames_dir='data/', start_frame=1, end_frame=15, max_size=144, min_size=(3, 15), downfactor=(0.85, 0.85), J=5, J_start_from=1, kernel_size=(3, 7, 7), sthw=(0.5, 1, 1), reduce='median', vgpnn_type='pm', use_noise=True, noise_amp=5, verbose=True, save_intermediate=True, save_intermediate_path='output/')
Traceback (most recent call last):
  File "/home/hans/code/vgpnn/run_generation.py", line 82, in <module>
    VGPNN, orig_vid = vgpnn.get_vgpnn(
  File "/home/hans/code/vgpnn/vgpnn.py", line 293, in get_vgpnn
    assert (
AssertionError: smallest pyramid level has less frames 2 than temporal kernel-size 3. You may want to increase min_size of the temporal dimension

I'm using all the default settings. I've only set --frames_dir and --frames_dir. My video is square about 300 frames long (but the default settings only look at the first 15).

I've also tried setting --min_size 4,15 but ran into exactly the same error.

Any idea what could be going wrong?

nivha commented

Hi,
Yes, the problem is that the min_size interface is a bit misleading...
What happened is that you set min_size=(4,15), but actually if you would look at the coarsest scale, the temporal dimension was not 4 but 2 (and therefore smaller than the kernel size which is 3, so it cannot operate on such input, it has to have at least the size of the kernel size).

The easy workaround is just putting min_size larger than 4 (say 5 or 6 or more) until you get the desired size at the coarsest scale. That's how I've been dealing with this issue (and this is what the assertion error tells you to do).
I agree that the interface at its current state is a bit misleading.

The only strange thing, is that this should only happen if sthw[0] is less than 1.. But the default is 1, is it not?
Can you maybe paste the parameters that you feed into vgpnn.get_vgpnn (3rd cell in the notebook)?

Thanks

nivha commented

This thing works for me:

VGPNN, orig_vid = vgpnn.get_vgpnn(
    frames_dir=f'data/{vid_name}',
    start_frame=1,
    end_frame=15,
    device=device,
    max_size=100,
    min_size=(4, 15),  # (T,H,W)
    downfactor=(0.85, 0.85),
    J=5,
    J_start_from=2,
    kernel_size=(3,7,7),
    sthw=(1,1,1),
    reduce='median',
    vgpnn_type='pm',
)

I don't see why you would encounter this problem, unless the parameters are different.
I'd love to help if you would like to send me your parameters. Also, what is the exact size of your frames? (H,W, so I could try and debug this)

Hi @nivha thanks for the response :)

The default settings of run_generation.py give me the following:

VGPNN, orig_vid = vgpnn.get_vgpnn(
    frames_dir="data/icoqry/",
    start_frame=1,
    end_frame=15,
    device="cuda:0",
    max_size=144,
    min_size=(3, 15),
    downfactor=(0.85, 0.85),
    J=5,
    J_start_from=1,
    kernel_size=(3, 7, 7),
    sthw=(0.5, 1, 1),
    reduce="median",
    vgpnn_type="pm",
)

Setting sthw=(1, 1, 1) instead works! Thanks for your help.