JacobChalk/TIM

Query on running Omnivore model

FranklinLeong opened this issue · 4 comments

Dear Jacob,

Thanks for this wonderful work. It's really impressive in terms of the performance which was why I am thinking of trying out the model. Specifically, now I'm trying to run the omnivore model to extract video features. However I'm facing some issue regarding which annotations.pkl to use. Initially I used EPIC_100_validation.pkl (I created another file that only has participant 2 for testing), for participant 2, this had ~400 rows. I managed to run everything and also the videomae. However, when I tried running TIM after merging the features, I just couldn't get the feat index to agree. in getitem of SlidingWindowDataset, v_data = self.v_feats[video_id][feat_indices, v_aug_indices] feat_indices always exceed the dimension.

Then I thought, perhaps I have to run Omnivore with EPIC_1_second_validation_feature_times.pkl which I successfully ran with videomae, however I encountered another issue where there's no participant_id column and no stop_timestamp column.

In summary, which annotation.pkl files should I use for running omnivore and how to solve the respective issue depending on which file is needed.

Thanks in advance for your help in this. I would greatly appreciate any inputs that you might have.

Hi,

You should use EPIC_1_second_validation_feature_times.pkl for the feature extraction. This will contain 1 second features every 0.2 seconds, which is likely why the indices are exceeding the dimension as EPIC_100_validation.pkl will only have the annotated action segments.

The issue when using EPIC_1_second_validation_feature_times.pkl happens due to us using an older version of the feature time file with different column headings. To fix this, we have now updated the code so that the correct column headings are used for the up-to-date provided files.

Apologies for any inconvenience and thank you for highlighting this issue! Pulling the changes should now fix this.

Hello,

Thank you for the prompt reply. Indeed it has been fixed and I am now running the feature extraction part at least without problem. Will see if any issue arises afterwards.

Thank you for also removing the part of the code about the debugging, I was also confused initially why it only ran for 10 iter . Had to dig a little into it to find out.

Since we're here I also encountered another issue while running VideoMAE, the shape of feature output and the shape of feature output of omnivore did not match when merging. I looked into the code and did some changes which allowed me to run the merging. This is what I did to resolve the issue, I'm only running with 1 aug so it seemed a bit weird that in the original code, only the first index of all_sets was kept and saved. Let me know if my implementation would cause some issues down the road.

image

This looks okay. The desired outcome is to have a set of features for each video of shape: [num_feats, num_aug, feat_dim], which is what your code appears to do.

However, we found the issue with our code and amended it. The original code did not return the desired outcome as we used .extend() for the all_sets list, instead of .append(), meaning the logic you commented out was no longer valid.

The updated code should now work and allow you to use one or more augmentation sets, with the desired shape. Again, thank you for pointing this out!

Thanks for your prompt reply. Greatly appreciated your help with the previous issue. I will now close the issue.