sukjunhwang/VITA

Dateset prepare

ChinChyi opened this issue · 1 comments

Why "STEP-2: Prepare annotations for combined data"? Why do we need to use the COCO dataset again for the second round of fine-tuning, even though it has already been used for pre-training?

It's due to the sparse number of videos of video datasets. COCO gets augmented into videos, and used for training together.