Sense-X/UniFormer

Found a new paper from you: InternVideo

SKBL5694 opened this issue · 1 comments

Recently found that you have uploaded another article on arxiv named InternVideo. But I've only recently gotten acquainted with Uniformer. Seems there are UniformerV2 and InternVideo work better than uniformer. And due to limited ability, it is difficult for me to quickly learn new articles and codes. So if I want to get a better result on my own work, which paper's code do you suggest I base my code on? Since you have made major contributions to all three articles, I would like to hear your opinion. Thanks and happy new year to you in advance. :)

Thanks for your question! The code for UniFormer and UniFormerV2 are basically the same. For InternVideo, it contains pertaining and many downstream tasks, thus the code seems to be redundant.

I claim the strengths of UniFormer and UniFormerV2 as follows:

  1. UniFormer is really efficient, which means it runs so fast. And we verify its performance on various vision tasks. However, due to the computation resource, we did not train it on large-scale datasets, such as ImageNet22K, thus its performance seems to be lower in 2023.
  2. UniFormerV2 can simply achieve strong performance thanks to the CLIP pre-training. It is also efficient because of the effective temporal module. But it only focuses on video tasks.

For your fast beginning, if you focus on video, I suggest you use UniFormerV2. You can simply fine-tune the models we provided, or you can design a novel temporal module.