voletiv/mcvd-pytorch

UCF101 Unconditional Generation FVD Result (16 frames vs 20 frames)

JunyaoHu opened this issue · 4 comments

Hello. I want to confirm the calculation method of unconditional generation FVD.
.
In your paper, you generate 16 frames.

image

#############
## UCF-101 ##
#############
export config="ucf101"
export data="${data_folder}"
export devices="0"
export nfp="16"

And you calculate FVD between the 16-frame predicted result and the 20-frame origin video, right?

for 20-frame origin video

# real
if future == 0:
real_fvd = torch.cat([
cond_original[:, :self.config.data.num_frames_cond*self.config.data.channels],
real
], dim=1)[::preds_per_test] # Ignore the repeated ones

real_fvd = to_i3d(real_fvd)
real_embeddings.append(get_fvd_feats(real_fvd, i3d=i3d, device=self.config.device))

# (3) fake 3: uncond
if calc_fvd3:
# real uncond
real_embeddings_uncond.append(real_embeddings2[-1] if second_calc else real_embeddings[-1])

for 16-frame predicted result

pred_uncond = torch.cat(pred_samples, dim=1)[:, :self.config.data.channels*num_frames_pred]
pred_uncond = inverse_data_transform(self.config, pred_uncond)

# fake uncond
fake_fvd_uncond = torch.cat([pred_uncond], dim=1) # We don't want to input the zero-mask
fake_fvd_uncond = to_i3d(fake_fvd_uncond)
fake_embeddings_uncond.append(get_fvd_feats(fake_fvd_uncond, i3d=i3d, device=self.config.device))

calculate unconditional FVD result

# (3) uncond
if calc_fvd3:
real_embeddings_uncond = np.concatenate(real_embeddings_uncond)
fake_embeddings_uncond = np.concatenate(fake_embeddings_uncond)
avg_fvd3, fvd3_traj_mean, fvd3_traj_std, fvd3_traj_conf95 = fvd_stuff(fake_embeddings_uncond, real_embeddings_uncond)
vid_metrics.update({'fvd3': avg_fvd3, 'fvd3_traj_mean': fvd3_traj_mean, 'fvd3_traj_std': fvd3_traj_std, 'fvd3_traj_conf95': fvd3_traj_conf95})

Hi Junyao,

No, we still use 16 frames for real data.

See

real, cond, cond_mask = conditioning_fn(self.config, real_, num_frames_pred=num_frames_pred,

and
https://github.com/voletiv/mcvd-pytorch/blob/451da2eb635bad50da6a7c03b443a34c6eb08b3a/runners/ncsn_runner.py#L115C28-L115C29.

@AlexiaJM Hello,

So, you calculate unconditional FVD between the 20-frame predicted result (pred20) and the 20-frame origin video (cond4+real16), right?


when I use your setting to do inference,

#############
## UCF-101 ##
#############
export config="ucf101"
export data="${data_folder}"
export devices="0"
export nfp="16"

I only run this shell.

export exp="ucf10132_big288_4c4_pmask50_unetm"
export exp=${exp_folder}/${exp}
export ckpt="900000"
export config_mod="data.prob_mask_cond=0.50 model.ngf=288 model.n_head_channels=288 data.num_frames=4 data.num_frames_cond=4 training.batch_size=32 sampling.batch_size=60 sampling.max_data_iter=1000 model.arch=unetmore"
sh ./example_scripts/final/base_1f_vidgen_short.sh

It will do the prediction task and generation task.

elif self.condp > 0.0 and self.futrf == 0: # (1) Pred + (3) Gen
num_frames_pred = self.config.sampling.num_frames_pred

In the video prediction task, FVD is calculated on (cond4+real16) and (cond4+pred16). Do pred16/4=4 time autoregressions. And my perception is consistent. In the video generation task, FVD is calculated on (cond4+real16) and (pred20). Do pred20/4=5 time autoregressions.

image

my config output is as follows:

(ps: I only edit sampling.preds_per_test=1, sampling.subsample=5 for getting results faster)

(EDM) ubuntu@ubuntu:~/zzc/code/mcvd-pytorch$ sh /home/ubuntu/zzc/code/mcvd-pytorch/example_scripts/final/sampling_scripts.sh
INFO - main.py - 2024-01-26 03:18:55,408 - Using device: cuda
INFO - main.py - 2024-01-26 03:18:55,409 - Namespace(config='configs/ucf101.yml', data_path='/home/ubuntu/zzc/data/video_prediction/UCF101/UCF101_h5', seed=1234, exp='/home/ubuntu/zzc/code/mcvd-pytorch/checkpoints/ucf10132_big288_4c4_pmask50_unetm', comment='', verbose='info', resume_training=False, test=False, feats_dir='/home/
ubuntu/zzc/code/mcvd-pytorch/datasets', stats_dir='/home/ubuntu/zzc/code/mcvd-pytorch/datasets', stats_download=False, fast_fid=False, fid_batch_size=1000, no_pr=False, fid_num_samples=None, pr_nn_k=None, sample=False, image_folder='images', final_only=True, end_ckpt=None, freq=None, no_ema=False, ni=True, interact=False, video_
gen=True, video_folder='/home/ubuntu/zzc/code/mcvd-pytorch/checkpoints/ucf10132_big288_4c4_pmask50_unetm/video_samples/videos_900000_DDPM_100_nfp_16', subsample=None, ckpt=900000, config_mod=['data.prob_mask_cond=0.50', 'model.ngf=288', 'model.n_head_channels=288', 'data.num_frames=4', 'data.num_frames_cond=4', 'training.batch_s
ize=32', 'sampling.batch_size=60', 'sampling.max_data_iter=1000', 'model.arch=unetmore', 'sampling.num_frames_pred=16', 'sampling.preds_per_test=1', 'sampling.subsample=5', 'model.version=DDPM'], start_at=0, command='python main.py --config configs/ucf101.yml --data_path /home/ubuntu/zzc/data/video_prediction/UCF101/UCF101_h5 --
exp /home/ubuntu/zzc/code/mcvd-pytorch/checkpoints/ucf10132_big288_4c4_pmask50_unetm --ni --config_mod data.prob_mask_cond=0.50 model.ngf=288 model.n_head_channels=288 data.num_frames=4 data.num_frames_cond=4 training.batch_size=32 sampling.batch_size=60 sampling.max_data_iter=1000 model.arch=unetmore sampling.num_frames_pred=16
 sampling.preds_per_test=1 sampling.subsample=5 model.version=DDPM --ckpt 900000 --video_gen -v videos_900000_DDPM_100_nfp_16', log_path='/home/ubuntu/zzc/code/mcvd-pytorch/checkpoints/ucf10132_big288_4c4_pmask50_unetm/logs')
INFO - main.py - 2024-01-26 03:18:55,410 - Writing log file to /home/ubuntu/zzc/code/mcvd-pytorch/checkpoints/ucf10132_big288_4c4_pmask50_unetm/logs
INFO - main.py - 2024-01-26 03:18:55,410 - Exp instance id = 36017
INFO - main.py - 2024-01-26 03:18:55,410 - Exp comment = 
INFO - main.py - 2024-01-26 03:18:55,410 - Config =
...


<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
INFO - main.py - 2024-01-26 03:18:55,419 - Args =
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ckpt: 900000
command: python main.py --config configs/ucf101.yml --data_path /home/ubuntu/zzc/data/video_prediction/UCF101/UCF101_h5
  --exp /home/ubuntu/zzc/code/mcvd-pytorch/checkpoints/ucf10132_big288_4c4_pmask50_unetm
  --ni --config_mod data.prob_mask_cond=0.50 model.ngf=288 model.n_head_channels=288
  data.num_frames=4 data.num_frames_cond=4 training.batch_size=32 sampling.batch_size=60
  sampling.max_data_iter=1000 model.arch=unetmore sampling.num_frames_pred=16 sampling.preds_per_test=1
  sampling.subsample=5 model.version=DDPM --ckpt 900000 --video_gen -v videos_900000_DDPM_100_nfp_16
comment: ''
config: configs/ucf101.yml
config_mod:
- data.prob_mask_cond=0.50
- model.ngf=288
- model.n_head_channels=288
- data.num_frames=4
- data.num_frames_cond=4
- training.batch_size=32
- sampling.batch_size=60
- sampling.max_data_iter=1000
- model.arch=unetmore
- sampling.num_frames_pred=16
- sampling.preds_per_test=1
- sampling.subsample=5
- model.version=DDPM
...


<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
INFO - ncsn_runner.py - 2024-01-26 03:18:59,159 - Loading ckpt /home/ubuntu/zzc/code/mcvd-pytorch/checkpoints/ucf10132_big288_4c4_pmask50_unetm/logs/checkpoint_900000.pt
Checking shard_lengths in ['/home/ubuntu/zzc/data/video_prediction/UCF101/UCF101_h5/shard_0001.hdf5']
h5: Opening /home/ubuntu/zzc/data/video_prediction/UCF101/UCF101_h5/shard_0001.hdf5... h5: paths 1 ; shard_lengths [13320] ; total 13320
Dataset length: 9624
Checking shard_lengths in ['/home/ubuntu/zzc/data/video_prediction/UCF101/UCF101_h5/shard_0001.hdf5']
h5: Opening /home/ubuntu/zzc/data/video_prediction/UCF101/UCF101_h5/shard_0001.hdf5... h5: paths 1 ; shard_lengths [13320] ; total 13320
Dataset length: 256
Setting up Perceptual loss...
/home/ubuntu/anaconda3/envs/EDM/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/ubuntu/anaconda3/envs/EDM/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use 
`weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Loading model from: /home/ubuntu/zzc/code/mcvd-pytorch/models/weights/v0.1/alex.pth
...[net-lin [alex]] initialized
...Done

video_gen dataloader:   0%|                                                                                                                                                                                                                                                                                        | 0/5 [00:00<?, ?it/s]I
NFO - ncsn_runner.py - 2024-01-26 03:19:52,738 - (1) Video Pred
INFO - ncsn_runner.py - 2024-01-26 03:19:52,739 - PREDICTING 16 frames, using a 4 frame model conditioned on 4 frames, subsample=5, preds_per_test=1
                                                                                                                                                                                                                                                                                                                                         D
DPM: 1/5, grad_norm: 221.89865112304688, image_norm: 35.91960144042969, grad_mean_norm: 815.6091918945312                                                                                                                                                                                                           | 0/4 [00:00<?, ?it/s]
INFO - __init__.py - 2024-01-26 03:20:06,223 - DDPM: 1/5, grad_norm: 221.89865112304688, image_norm: 35.91960144042969, grad_mean_norm: 815.6091918945312

...

DDPM: 5/5, grad_norm: 378.7744445800781, image_norm: 79.4982681274414, grad_mean_norm: 815.8571166992188
INFO - __init__.py - 2024-01-26 03:20:21,519 - DDPM: 5/5, grad_norm: 378.7744445800781, image_norm: 79.4982681274414, grad_mean_norm: 815.8571166992188
Generating video frames: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:29<00:00,  7.30s/it]
INFO - ncsn_runner.py - 2024-01-26 03:27:10,659 - fvd1 True, fvd2 False, fvd3 True██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:29<00:00,  5.87s/it]
INFO - ncsn_runner.py - 2024-01-26 03:27:10,660 - (3) Video Gen - Uncond - FVD
INFO - ncsn_runner.py - 2024-01-26 03:27:10,660 - GENERATING (Uncond) 20 frames, using a 4 frame model (conditioned on 4 cond + 0 futr frames), subsample=5, preds_per_test=1
                                                                                                                                                                                                                                                                                                                                         DDPM: 1/5, grad_norm: 221.8052520751953, image_norm: 35.507598876953125, grad_mean_norm: 817.7996826171875                                                                                                                                                                                                           | 0/5 [00:00<?, ?it/s]
INFO - __init__.py - 2024-01-26 03:27:11,371 - DDPM: 1/5, grad_norm: 221.8052520751953, image_norm: 35.507598876953125, grad_mean_norm: 817.7996826171875
DDPM: 2/5, grad_norm: 221.9393310546875, image_norm: 53.967655181884766, grad_mean_norm: 810.0634155273438


Yes, you have it right.

Very thanks, it helps me a lot!