RuntimeError: Given groups=1, weight of size [384, 2304, 3], expected input[2, 256, 2560] to have 2304 channels, but got 256 channels instead
TousakaNagio opened this issue · 1 comments
TousakaNagio commented
Hi,
I followed the instruction to download the video features and convert them to lmdb,
however, when I ran the pretrain script, this runtimeerror occured.
RuntimeError: Given groups=1, weight of size [384, 2304, 3], expected input[2, 256, 2560] to have 2304 channels, but got 256 channels instead
Would you please help to deal with this problem?
Thank you every much.
houzhijian commented
Hi, the actual visual feature for network input is the concatenation of EgoVLP (dimension:256), InternVideo-Verb (dimension:1024) and InternVideo-Noun (dimension:1024) features. In total, the overall dimension is 2304. For your case, you might only use EgoVLP features.