Visual Token of HowTo100M

Question

Visual Token of HowTo100M

zhengsipeng opened this issue 3 years ago · 3 comments

Hi, do you transform the raw videos of HTM datasets into visual tokens during the pre-training? And how large of the total size of its visual tokens? Since HTM takes 12T space, I'm curious about the size of its visual tokens.

Answer 1 · 2021-09-29T02:12:08.000Z

We pre-extracted the tokens and used them during pre-training. The pre-extraction script is provided here: video2token.

I do not have the exact number of disk space for now. It should take 100~200G for saving all the tokens since the original video is largely compressed.

Answer 2 · 2021-10-05T07:02:11.000Z

We pre-extracted the tokens and used them during pre-training. The pre-extraction script is provided here: video2token.

I do not have the exact number of disk space for now. It should take 100~200G for saving all the tokens since the original video is largely compressed.

Hi, Can you privode data processing code for HowTo100M Pretraining? It seems a bit different from datasets?

Answer 3 · 2022-03-11T08:31:12.000Z

Hi, is there code for HowTo100M video process?
Because it seems that the video2token only provide the process code for downstream dataset