Existing PVRs

Question

Existing PVRs

huangjy-pku opened this issue 2 years ago · 3 comments

Have you released the implementation of existing pre-trained visual representations (PVRs) that you evaluated in the paper? Concretely, the evaluation of CLIP, R3M, MVP, VIP.

Answer 1 · 2023-04-20T18:06:33.000Z

Hello @huangjy-pku,

Thanks for your question. The evaluation code for CLIP, R3M, MVP, and VIP is not part of the current release. However, it may be included in future updates. Stay tuned for further developments!

Answer 2 · 2023-04-21T11:15:19.000Z

Thanks for your reply. Actually, we also evaluated visual representations pre-trained from ego-views (e.g., Ego4D), such as R3M and VIP, on a broad range of visual tasks (see our paper). We found ego-centric visual representations show weaker performances than others (e.g., CLIP), even on navigation and manipulation tasks. This is quite surprising and we conjecture it is due to domain gap between ego and third-person view.

However, the evaluation results of R3M and VIP in CortexBench are quite promising. This makes me confused and is the reason why I ask you for more information. Appreciate for more details or your opinions :)

Answer 3 · 2024-05-03T02:30:46.000Z

Closing this issue since it's becoming a research discussion.

The evaluations are agnostic to and fully support any PVR such as R3M, CLIP etc. To do so you can write a config similar to this file, and a loading function for the target field.