features

Question

features

wolfworld6 opened this issue 3 years ago · 2 comments

thx，I am confused the features for temporal action localization on Epic-Kitchens-100，for example，the feature P01_13_0005000_0010000.pkl shape is(38, 6144),but in the config file ,the NUM_INPUT_CHANNELS is 6912? the input image size to get the feature is 224?

Answer 1 · 2022-03-29T02:53:51.000Z

Thanks for using our features. 6144 = 3*2048. Our current open source epic features are extracted by tadaconv. 3 means that the video is cropped three times (i.e. left, middle and right). 6912 is the dimension of vivit feature, but this part of the feature is still in the approval process.

Tadaconv supports 256 images as input, so it is better to use 256. However, vivit only supports 224.

Answer 2 · 2022-03-29T10:40:21.000Z

Thanks for using our features. 6144 = 3*2048. Our current open source epic features are extracted by tadaconv. 3 means that the video is cropped three times (i.e. left, middle and right). 6912 is the dimension of vivit feature, but this part of the feature is still in the approval process.

Tadaconv supports 256 images as input, so it is better to use 256. However, vivit only supports 224.

thx， thus the model architecture using Tadaconv for Epic-Kitchens-100 will not be released for now？