The official implementation for the ACM MM 2024 paper [Simplifying Cross-modal Interaction via Modality-Shared Features for RGBT Tracking].
conda create -n iimf python=3.8
conda activate iimf
bash install.sh
Run the following command to set paths for this project
python tracking/create_default_local_file.py --workspace_dir . --data_dir ../ --save_dir ./output
The data_dir should be the place where datasets are. For example, if you want to train on LasHeR and LasHeR is in the ../data, then --data_dir ../data. After running this command, you can also modify paths by editing these two files
lib/train/admin/local.py # paths about training
lib/test/evaluation/local.py # paths about testing
It should be noted that you should modify the file lib/test/evaluation/local.py when testing the model. The lasher_test is not always set well. You will need to set the lasher_test by yourself like
settings.lasher_path = '/username/lasher/testingset'
Put the tracking datasets in the path you set. It should look like:
-- lasher
|-- trainingset
|-- testingset
|-- trainingsetList.txt
|-- testingsetList.txt
...
Download ImageNet or SOT pre-trained weights and put them under $PROJECT_ROOT$/pretrained_models
. The pre-trained weights are provided by the work TBSI and we thank the contribution by them.
python tracking/train.py --script tbsi_track --config vitb_256_tbsi_32x4_4e4_lasher_15ep_in1k --save_dir ./output/vitb_256_tbsi_32x4_4e4_lasher_15ep_in1k --mode multiple --nproc_per_node 4
python tracking/train.py --script tbsi_track --config vitb_256_tbsi_32x1_1e4_lasher_15ep_sot --save_dir ./output/vitb_256_tbsi_32x1_1e4_lasher_15ep_sot --mode multiple --nproc_per_node 4
Replace --config
with the desired model config under experiments/tbsi_track
.
Four GPUs available make nproc_per_node 4 and only one GPU available makes the mode single with nproc_per_node not needed.
Put the checkpoint into $PROJECT_ROOT$/output/config_name/...
or modify the checkpoint path in the testing code.
CUDA_VISIBLE_DEVICES=3 python tracking/test.py tbsi_track vitb_256_tbsi_32x4_4e4_lasher_15ep_in1k --dataset_name lasher_test --threads 6 --num_gpus 1
CUDA_VISIBLE_DEVICES=3 python tracking/test.py tbsi_track vitb_256_tbsi_32x1_1e4_lasher_15ep_sot --dataset_name lasher_test --threads 6 --num_gpus 1
python tracking/analysis_results.py --tracker_name tbsi_track --tracker_param vitb_256_tbsi_32x4_4e4_lasher_15ep_in1k --dataset_name lasher_test
python tracking/analysis_results.py --tracker_name tbsi_track --tracker_param vitb_256_tbsi_32x1_1e4_lasher_15ep_sot --dataset_name lasher_test
Model | Backbone | Pretraining | Precision | NormPrec | Success | Checkpoint | Raw Result
IIMF | ViT-Base | SOT | 72.35 | 68.39 | 58.07 | download | download
It should be noted that our work is tested and trained on 4 V100 GPUs, and different GPUs might lead to different test results. The raw result and checkpoint that we provide are all achieved on V100 GPUs.
Our project is developed on OSTrack and TBSI. Thanks for their contributions which help us to quickly implement our ideas.
If our work is useful for your research, please consider citing our work as follows:
@inproceedings{IIMF,
title={Simplifying Cross-modal Interaction via Modality-Shared Features for RGBT Tracking},
author={Chen, Liqiu and Huang, Yuqing and Li, Hengyu and Zhou, Zikun and He, Zhenyu},
booktitle={Proceedings of the 32th ACM International Conference on Multimedia},
year={2024}
}