kkoutini/PaSST

Changing tdim for pretrained model

Closed this issue · 3 comments

Thanks for sharing such great work! I want to use the pre-trained model but changing input_tdim is giving an error. My audio clips are relatively small and hence i need a smaller input_tdim. How do I do that? The error I get is due to the pretrained layer's size not equal to the current size of the model(After using input_tdim)

Hi, can you give more details about the error?
In general, if the audio is shorter than expected, you can leave input_tdim and it should be automatically handled.

The error goes like follows -

RuntimeError: Error(s) in loading state_dict for PaSST:
size mismatch for time_new_pos_embed: copying a param with shape torch.Size([1, 768, 1, 99]) from checkpoint, the shape in current model is torch.Size([1, 768, 1, 7]).

This is corresponding to the following code -
model = passt.get_model(arch="passt_s_swa_p16_128_ap476", pretrained=True, n_classes=2, in_channels=1, fstride=10, tstride=10,input_fdim=128, input_tdim=78, u_patchout=0, s_patchout_t=40, s_patchout_f=4)

Hi, you can keep the input_tdim to its default value. the model should handle shorter audio clips.