sming256/OpenTAD

AdaTAD Training Resume

Closed this issue · 2 comments

Hi Shuming,

Will the training automatically resume if I kill the process and run it again?

Thanks.

We support resuming the training from a checkpoint.

For example, to resume the training of AdaTAD on THUMOS, you can use the following command
torchrun --nnodes=1 --nproc_per_node=2 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 tools/train.py configs/adatad/thumos/e2e_thumos_videomae_s_768x1_160_adapter.py --resume exps/thumos/adatad/e2e_actionformer_videomae_s_768x1_160_adapter/gpu2_id0/epoch_21.pth

Thanks.