showlab/EgoVLP

Question about Ego4D annotation

zhengsipeng opened this issue · 5 comments

Hi, thanks for your great work! I have a small question about Ego4D annotation.

I notice that in narration.json file, the i-th narration for a video is labeled like :
{timestamp_sec: 19.2, timestamp_frame: 1823, narration_text: "hello"}

I understand 'timestamp_sec' is the start timestamp of i-th narration, but what is its end timestamp? the i+1-th narration's timestamp_sec? I notice it's quite often that i+1-th timestamp_sec < i-th timestamp_sec, does it mean the annotation fault?

How did you generate egoclip.csv using the narration?

Hi @zhengsipeng ,

We do not know the exact start and end timestamp of i-th narration, we only know in timestamp_sec, the visual moment is aligned with i-th timestamp_sec narration.
``i+1-th timestamp_sec < i-th timestamp_sec'' may happen since the original narration.json may be out of order, we rerank it based on timestamp.

How we create egoclip.csv, you can refer to the relevant details in our paper.

Thank you for your reply.
I also rerank the timestamp finally and it's really sad that each segment does not have an exact start/end timestamp.

@zhengsipeng , Yeap, this is the main issue to utilize the ego4d data, and also there is why we propose egoclip...

@zhengsipeng , Yeap, this is the main issue to utilize the ego4d data, and also there is why we propose egoclip...

Have you ever try directly using the i-th timestamp and i+1-th timestamp as the start/end of i-th narration for pre-training?
If so, does it performs worse than your egoclip annotation?

@zhengsipeng
Yes, we have tried this variant.
and the performance is quite poor, since directly ``i-th timestamp and i+1-th timestamp as the start/end of i-th narration'' is too loose. There is likely to be a long period of time between two timestamps where nothing happened