Why is Mel hop_len different for preprocess and training?
cyanbx opened this issue · 2 comments
cyanbx commented
Hi, thanks for sharing your great work. I'm a little confused with the mel hop length, which is 250 in data_preprocess but 256 in the dataset for training. However, when I change the hop_len
param of audio_video_spec_fullset_Dataset
to 256, I get the following error in diffusion forward:
2024-03-14 21:54:13.900 File "Diff-Foley/training/stage2_ldm/adm/modules/diffusionmodules/openai_unetmodel.py", line 736, in forward
2024-03-14 21:54:13.900 h = th.cat([h, hs.pop()], dim=1)
2024-03-14 21:54:13.900 RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 18 but got size 17 for tensor number 1 in the list.
Any help with it? Thanks a lot.
kxgong commented
I also met a similar problem in training.
Diff-Foley/training/stage2_ldm/adm/modules/diffusionmodules/openai_unetmodel.py", line 744, in forward
h = th.cat([h, hs.pop()], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 4 but got size 3 for tensor number 1 in the list.
luosiallen commented
hey. Thanks for mentioning. For Stage2 training and inference, we use hop_len 256. For Stage1 training and inference, we use 250. This is for the purpose for temporal alignment.