How long does it take for pre-training on TV with MLM+MNCE from scratch?
HenryHZY opened this issue · 4 comments
@linjieli222
Hi, thanks for your great project!
As mentioned in your paper, the best pre-trained HERO needs to be trained on 16 V100 GPUs for about 3 weeks.
Due to the limitation of GPU and memory, I would like to conduct pre-training on TV with MLM+MNCE firstly. (that is, L2 in Table 1 in your paper)
I would like to ask three questions:
-
How long does it take for pre-training on TV with MLM+MNCE from scratch? (L2 in Table 1 in your paper)
-
Could you please show me the commands to conduct pre-training on TV with MLM+MNCE and fine-tuning on TVR from scratch? I am a novice in pre-training projects. :)
I think I need to conduct this experiment by 7 steps:
1/ download TV dataset 2/ Text & Video feature extraction from TV dataset or directly use the Text & Video features provided by you 3/ pre-training on TV with MLM+MNCE 4/ download TVR dataset 5/ Text & Video feature extraction from TVR dataset or directly use the Text & Video features provided by you 6/ fine-tuning & inference on TVR 7/ submit results to TVR codalab
-
I find that the downloading of
bash scripts/download_tvr.sh $PATH_TO_STORAGE
is too slow, less than 1m/s. Do you have another download server?
[Done. No need to reply this question.]
@linjieli222
For question 2, are the following commands correct? (Just copy from your README.md)
1/ download TV dataset
2/ Text & Video feature extraction from TV dataset
Here, I directly use the Text & Video features provided by you:
# outside of the container
bash scripts/download_tv_pretrain.sh $PATH_TO_STORAGE
3/ pre-training on TV with MLM+MNCE
# inside of the container
horovodrun -np 16 python pretrain.py --config config/pretrain-tv-16gpu.json --output_dir $PRETRAIN_EXP
HERO/config/pretrain-tv-16gpu.json
Line 11 in 32c1c52
from
"tasks": ["mlm", "mfm-nce", "fom", "vsm"]
to
"tasks": ["mlm", "mfm-nce"]
4/ download TVR dataset
5/ Text & Video feature extraction from TVR dataset
Here, I directly use the Text & Video features provided by you
bash scripts/download_tvr.sh $PATH_TO_STORAGE
6/ fine-tuning & inference on TVR
# fine-tunin, inside the container
horovodrun -np 8 python train_vcmr.py --config config/train-tvr-8gpu.json
# inference, inside the container
horovodrun -np 8 python eval_vcmr.py --query_txt_db /txt/tvr_val.db/ --split val \
--vfeat_db /video/tv/ --sub_txt_db /txt/tv_subtitles.db/ \
--output_dir /storage/tvr_default/ --checkpoint 4800 --fp16 --pin_mem
7/ submit results to TVR codalab
It was more a year ago when we conducted the pretraining ablation experiments. From what I recall, it may take about 2-3 day on 8 GPUs.
Note that you will need to reduce the pre-training steps by half for MLM+MFM-NCE
if you want to strictly follow our settings in the pre-training ablation table.
And remember to change the pretrained checkpoints in the config/train-tvr-8gpu.json
for finetuning.
Another useful information, please use azcopy
to download, if you ever find it slow. You can refer to VALUE-Leaderboard/StarterCode/scripts/download_tvr.sh.
It was more a year ago when we conducted the pretraining ablation experiments. From what I recall, it may take about 2-3 day on 8 GPUs.
Note that you will need to reduce the pre-training steps by half for
MLM+MFM-NCE
if you want to strictly follow our settings in the pre-training ablation table.And remember to change the pretrained checkpoints in the
config/train-tvr-8gpu.json
for finetuning.Another useful information, please use
azcopy
to download, if you ever find it slow. You can refer to VALUE-Leaderboard/StarterCode/scripts/download_tvr.sh.
Thanks for your quick reply!
The VALUE is really a great project, which contains VALUE-StarterCode and VALUE-DataRelease.
Maybe I could use the VALUE-StarterCode for a better beginning of my adventure towards video pre-training.
I would like to temporarily close this issue, and reopen it if there are any other questions later, thanks again.