RuntimeError: Given groups=1, weight of size [384, 2304, 3], expected input[2, 256, 2560] to have 2304 channels, but got 256 channels instead

Question

RuntimeError: Given groups=1, weight of size [384, 2304, 3], expected input[2, 256, 2560] to have 2304 channels, but got 256 channels instead

TousakaNagio opened this issue a year ago · 1 comments

Hi,

I followed the instruction to download the video features and convert them to lmdb,
however, when I ran the pretrain script, this runtimeerror occured.

RuntimeError: Given groups=1, weight of size [384, 2304, 3], expected input[2, 256, 2560] to have 2304 channels, but got 256 channels instead

Would you please help to deal with this problem?
Thank you every much.

Answer 1 · 2023-09-01T13:23:35.000Z

Hi, the actual visual feature for network input is the concatenation of EgoVLP (dimension:256), InternVideo-Verb (dimension:1024) and InternVideo-Noun (dimension:1024) features. In total, the overall dimension is 2304. For your case, you might only use EgoVLP features.