JointNLT

The official implementation for the CVPR 2023 paper Joint Visual Grounding and Tracking with Natural Language Specification.

[Models][Raw Results][Poster][Slide]

Demo

Framework

Install the environment

Option1: Use the Anaconda (CUDA 11.3)

conda create -n joint python=3.7
conda env create -f joint.yaml
conda activate joint

Option2: Use the Anaconda Pack (CUDA 11.3) first download the [Env Package]

mkdir $USER_ROOT$/anaconda3/envs/joint
tar -xzvf joint.tar.gz -C $USER_ROOT$/anaconda3/envs/joint
conda activate joint

Data Preparation

Put the tracking datasets in ./data. It should look like:

${JointNLT_ROOT}
 -- data
     -- LaSOT
         |-- airplane
         |-- basketball
         |-- bear
         ...
     -- LaSOTTest
         |-- airplane
         |-- bird
         |-- bus
         ...    
     -- LaSOTText
         |-- atv
         |-- badminton
         |-- cosplay
         ...  
     -- TNL2K_train
         |-- Arrow_Video_ZZ04_done
         |-- Assassin_video_1-Done 
         ...
     -- TNL2K_test
         |-- Assian_video_Z03_done
         |-- BF5_Blade_video_01-Done
     --COCO
         |-- images
         |-- refcoco
         |-- refcoco+
         |-- refcocog
     --OTB_sentences
         |-- OTB_query_test
         |-- OTB_query_train
         |-- OTB_videos

Set project paths

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Train JointNLT

Download the pretrained weight [BERT pretrained weight] put it under $PROJECT_ROOT$/pretrained .

Training with multiple GPUs using DDP.

# JointNLT
python tracking/train.py --script jointnlt --config swin_b_ep300 --save_dir log/swin_ep300 --mode multiple --nproc_per_node 4

Evaluation

LaSOT/TNL2K/OTB99. Download the model weights from Google Drive

Put the downloaded weights on $PROJECT_ROOT$/checkpoints/

Change the corresponding values of lib/test/evaluation/local.py to the actual benchmark saving paths

Evaluate initialized by Natural Language (NL):

LaSOT or other off-line evaluated benchmarks (modify --dataset correspondingly)

python tracking/test.py jointnlt swin_b_ep300 --dataset lasot --threads 16 --num_gpus 4 --params__model JointNLT_ep0300.pth.tar
python tracking/analysis_results.py --dataset_name lasot --tracker_param swin_b_ep300

TNL2K

python tracking/test.py jointnlt swin_b_ep300 --dataset tnl2k --threads 16 --num_gpus 4 --params__model JointNLT_ep0300.pth.tar
python tracking/analysis_results.py --dataset_name tnl2k --tracker_param swin_b_ep300

OTB99

python tracking/test.py jointnlt swin_b_ep300 --dataset otb --threads 16 --num_gpus 4 --params__model JointNLT_ep0300.pth.tar
python tracking/analysis_results.py --dataset_name otb --tracker_param swin_b_ep300

Evaluate initialized by Box and Natural Language (NL):

LaSOT or other off-line evaluated benchmarks (modify --dataset correspondingly)

python tracking/test.py jointnlt swin_b_ep300_track --dataset lasot --threads 16 --num_gpus 4 --params__model JointNLT_ep0300.pth.tar
python tracking/analysis_results.py --dataset_name lasot --tracker_param swin_b_ep300_track

TNL2K

python tracking/test.py jointnlt swin_b_ep300_track --dataset tnl2k --threads 16 --num_gpus 4 --params__model JointNLT_ep0300.pth.tar
python tracking/analysis_results.py --dataset_name tnl2k --tracker_param swin_b_ep300_track

OTB99

python tracking/test.py jointnlt swin_b_ep300_track --dataset otb --threads 16 --num_gpus 4 --params__model JointNLT_ep0300.pth.tar
python tracking/analysis_results.py --dataset_name otb --tracker_param swin_b_ep300_track

Evaluate the grounding performance.

Note: We perform the grounding on the val of refcocog to show our method grounding performance.

# Profiling swin_b_ep300
python tracking/test_grounding.py --script jointnlt --config swin_b_ep300 --ckpt checkpoints/JointNLT_ep0300.pth.tar

Test FLOPs, and Speed

Note: The speeds reported in our paper were tested on a single RTX3090 GPU.

# Profiling swin_b_ep300
python tracking/profile_model.py --script jointnlt --config swin_b_ep300 --display_name 'JointNLT'

Contact

Li Zhou: lizhou.hit@gmail.com

Acknowledgments

Thanks for the MixFormer and PyTracking library, which helps us to quickly implement our ideas.
We use the implementation of the Swin from the Swin-Transformer repo and BERT from the [pytorch_pretrained_bert].

Citation

If our work is useful for your research, please consider cite:

@misc{zhou2023joint,
      title={Joint Visual Grounding and Tracking with Natural Language Specification}, 
      author={Li Zhou and Zikun Zhou and Kaige Mao and Zhenyu He},
      year={2023},
      eprint={2303.12027},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

lizhou-cs/JointNLT