Query on running Omnivore model
FranklinLeong opened this issue · 4 comments
Dear Jacob,
Thanks for this wonderful work. It's really impressive in terms of the performance which was why I am thinking of trying out the model. Specifically, now I'm trying to run the omnivore model to extract video features. However I'm facing some issue regarding which annotations.pkl to use. Initially I used EPIC_100_validation.pkl
(I created another file that only has participant 2 for testing), for participant 2, this had ~400 rows. I managed to run everything and also the videomae. However, when I tried running TIM after merging the features, I just couldn't get the feat index to agree. in getitem of SlidingWindowDataset, v_data = self.v_feats[video_id][feat_indices, v_aug_indices]
feat_indices
always exceed the dimension.
Then I thought, perhaps I have to run Omnivore with EPIC_1_second_validation_feature_times.pkl
which I successfully ran with videomae, however I encountered another issue where there's no participant_id
column and no stop_timestamp
column.
In summary, which annotation.pkl
files should I use for running omnivore and how to solve the respective issue depending on which file is needed.
Thanks in advance for your help in this. I would greatly appreciate any inputs that you might have.
Hi,
You should use EPIC_1_second_validation_feature_times.pkl
for the feature extraction. This will contain 1 second features every 0.2 seconds, which is likely why the indices are exceeding the dimension as EPIC_100_validation.pkl
will only have the annotated action segments.
The issue when using EPIC_1_second_validation_feature_times.pkl
happens due to us using an older version of the feature time file with different column headings. To fix this, we have now updated the code so that the correct column headings are used for the up-to-date provided files.
Apologies for any inconvenience and thank you for highlighting this issue! Pulling the changes should now fix this.
Hello,
Thank you for the prompt reply. Indeed it has been fixed and I am now running the feature extraction part at least without problem. Will see if any issue arises afterwards.
Thank you for also removing the part of the code about the debugging, I was also confused initially why it only ran for 10 iter . Had to dig a little into it to find out.
Since we're here I also encountered another issue while running VideoMAE, the shape of feature output and the shape of feature output of omnivore did not match when merging. I looked into the code and did some changes which allowed me to run the merging. This is what I did to resolve the issue, I'm only running with 1 aug so it seemed a bit weird that in the original code, only the first index of all_sets
was kept and saved. Let me know if my implementation would cause some issues down the road.
This looks okay. The desired outcome is to have a set of features for each video of shape: [num_feats, num_aug, feat_dim]
, which is what your code appears to do.
However, we found the issue with our code and amended it. The original code did not return the desired outcome as we used .extend()
for the all_sets list, instead of .append()
, meaning the logic you commented out was no longer valid.
The updated code should now work and allow you to use one or more augmentation sets, with the desired shape. Again, thank you for pointing this out!
Thanks for your prompt reply. Greatly appreciated your help with the previous issue. I will now close the issue.