Existing PVRs
huangjy-pku opened this issue · 3 comments
Have you released the implementation of existing pre-trained visual representations (PVRs) that you evaluated in the paper? Concretely, the evaluation of CLIP, R3M, MVP, VIP.
Hello @huangjy-pku,
Thanks for your question. The evaluation code for CLIP, R3M, MVP, and VIP is not part of the current release. However, it may be included in future updates. Stay tuned for further developments!
Thanks for your reply. Actually, we also evaluated visual representations pre-trained from ego-views (e.g., Ego4D), such as R3M and VIP, on a broad range of visual tasks (see our paper). We found ego-centric visual representations show weaker performances than others (e.g., CLIP), even on navigation and manipulation tasks. This is quite surprising and we conjecture it is due to domain gap between ego and third-person view.
However, the evaluation results of R3M and VIP in CortexBench are quite promising. This makes me confused and is the reason why I ask you for more information. Appreciate for more details or your opinions :)
Closing this issue since it's becoming a research discussion.
The evaluations are agnostic to and fully support any PVR such as R3M, CLIP etc. To do so you can write a config similar to this file, and a loading function for the target field.