RenShuhuai-Andy/TimeChat

What is the relationship between segment and timetoken?

sunwhw opened this issue · 3 comments

Hi, thanks for you great work! I want to ask what is the relationship between original segment(contained in seg_prompt) and time_token_?
image

Is asr data used when constructing instruction data?

Hi, thanks for your interest.

The time token is used in Vid2seq, which uses relative timestampes. Specifically, it quantizes any video of duration $T_i$ into 100 equally-spaced timestamps. Accordingly, <time_token_36> represents the time of 36/100 * $T_i$ seconds.

In contrast, we use absolute timestamps, i.e., original segments contained in seg_prompts. The first number in seg_prompts denotes the duration (seconds) of current video. After that, each two numbers represent the start and end time of a fragment.

The training on TimeIT-104k uses asr data, while the fine-tuning or evaluation on Youcook2, Charades, and qvhighlight do not use asr data.

oh, thanks for your clear reply!