What is the relationship between segment and timetoken?
sunwhw opened this issue · 3 comments
Is asr data used when constructing instruction data?
Hi, thanks for your interest.
The time token is used in Vid2seq, which uses relative timestampes. Specifically, it quantizes any video of duration <time_token_36>
represents the time of 36/100 *
In contrast, we use absolute timestamps, i.e., original segments contained in seg_prompts. The first number in seg_prompts denotes the duration (seconds) of current video. After that, each two numbers represent the start and end time of a fragment.
The training on TimeIT-104k uses asr data, while the fine-tuning or evaluation on Youcook2, Charades, and qvhighlight do not use asr data.
oh, thanks for your clear reply!