/ROMTrack

[ICCV 2023] Robust Object Modeling for Visual Tracking, Official Implementation

Primary LanguagePythonMIT LicenseMIT

ROMTrack

The official implementation of the ICCV 2023 paper Robust Object Modeling for Visual Tracking

[CVF Open Access] [Poster] [Video]

ROMTrack_Pipeline

[Models and Raw Results] (Google Drive) [Models and Raw Results] (Baidu Netdisk: romt)

Base Models

Variant ROMTrack ROMTrack-384
Model Setting ViT-Base ViT-Base
Pretrained Method MAE MAE
Pretrained Weight MAE checkpoint MAE checkpoint
Template / Search 128×128 / 256×256 192×192 / 384×384
GOT-10k
(AO / SR 0.5 / SR 0.75)
72.9 / 82.9 / 70.2 74.2 / 84.3 / 72.4
LaSOT
(AUC / Norm P / P)
69.3 / 78.8 / 75.6 71.4 / 81.4 / 78.2
TrackingNet
(AUC / Norm P / P)
83.6 / 88.4 / 82.7 84.1 / 89.0 / 83.7
LaSOT_ext
(AUC / Norm P / P)
48.9 / 59.3 / 55.0 51.3 / 62.4 / 58.6
TNL2K
(AUC / Norm P / P)
56.9 / 73.7 / 58.1 58.0 / 75.0 / 59.6
NFS / OTB / UAV
(AUC)
68.0 / 71.4 / 69.7 68.8 / 70.9 / 70.5
VOT2020 BBox
(EAO / A / R)
0.326 / 0.480 / 0.816 0.329 / 0.483 / 0.822
GPU FPS / MACs(G) / Params(M) 116 / 34.5 / 92.1 67 / 77.7 / 92.1
CPU FPS 9.9 3.0

Extended Models (Efficiency-Oriented)

Variant ROMTrack-Tiny-256 ROMTrack-Small-256
Model Setting ViT-Tiny ViT-Small
Pretrained Method Supervised on ImageNet-22k Supervised on ImageNet-22k
Pretrained Weight Timm checkpoint Timm checkpoint
Template / Search 128×128 / 256×256 128×128 / 256×256
LaSOT
(AUC / Norm P / P)
59.3 / 68.8 / 60.4 62.3 / 72.3 / 65.3
TrackingNet
(AUC / Norm P / P)
75.8 / 81.7 / 71.5 78.5 / 84.3 / 75.3
LaSOT_ext
(AUC / Norm P / P)
40.4 / 49.7 / 43.1 43.2 / 52.9 / 47.1
TNL2K
(AUC / Norm P / P)
48.6 / 64.4 / 45.5 52.0 / 68.7 / 50.5
NFS / OTB / UAV
(AUC)
62.5 / 68.5 / 62.9 65.3 / 68.9 / 66.4
VOT2020 BBox
(EAO / A / R)
0.265 / 0.459 / 0.704 0.297 / 0.477 / 0.764
GPU FPS / MACs(G) / Params(M) 466 / 2.7 / 8.0 236 / 9.3 / 25.4
CPU FPS 36.6 17.2

Extended Models (Performance-Oriented)

Variant ROMTrack-Large-384
Model Setting ViT-Large
Pretrained Method MAE
Pretrained Weight MAE checkpoint
Template / Search 192×192 / 384×384
LaSOT
(AUC / Norm P / P)
72.0 / 81.7 / 79.1
TrackingNet
(AUC / Norm P / P)
85.2 / 89.8 / 85.4
LaSOT_ext
(AUC / Norm P / P)
52.9 / 64.3 / 60.9
TNL2K
(AUC / Norm P / P)
60.4 / 77.7 / 63.9
NFS / OTB / UAV
(AUC)
69.2 / 71.0 / 71.5
VOT2020 BBox
(EAO / A / R)
0.338 / 0.492 / 0.820
GPU FPS / MACs(G) / Params(M) 21 / 266.5 / 311.3
CPU FPS 1.1

📰 News

[May 2, 2024]

  • We release the extended models ROMTrack-Large-384 for Performance-Oriented Visual Tracking!
  • Models and Raw Results for all versions of ROMTrack are available on Google Drive or Baidu Netdisk.
  • Code and script for VOT2020 evaluation are available now.

[April 18, 2024]

  • We release the extended models ROMTrack-Tiny-256 and ROMTrack-Small-256 for Efficient Visual Tracking!
  • We provide detailed information for all versions of ROMTrack, see Base Models and Extended Models above.

[April 17, 2024]

  • Repository Upgrade is already done! Training and Evaluation using PyTorch 2.2.0 and Python 3.8 brings more efficiency.
  • Training and Evaluation Devices for the upgraded code: RTX A6000, Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz, Ubuntu 20.04.1 LTS.

[March 25, 2024]

  • We upgrade the implementation to Python 3.8 and PyTorch 2.2.0!
  • We update results on TNL2K!
  • We update FPS metrics on RTX A6000 GPU for reference.

[March 21, 2024]

  • We update 2 radar plots for visualization on LaSOT and LaSOT_ext.
  • We post a blog on Zhihu, welcome for reading.

[October 18, 2023]

  • We update paper in CVF Open Access version.
  • We release poster and video.

[September 21, 2023]

  • We release Models and Raw Results of ROMTrack.
  • We refine README for more details.

[August 6, 2023]

  • We release Code of ROMTrack.

[July 14, 2023]

  • ROMTrack is accepted to ICCV2023!

📆 TODO

  • Extended Models (Efficiency-Oriented & Performance-Oriented) for ROMTrack
  • Repository Upgrade
  • More Analysis (Radar Plot) and More Results (TNL2K Dataset)
  • Code for ROMTrack
  • Model Zoo and Raw Results
  • Refine README

⭐ Highlights

🚀 New Tracking Framework pursing Robustness

  • ROMTrack employes a robust object modeling design which can keep the inherent information of the target template and enables mutual feature matching between the target and the search region simultaneously.

Robust_Modeling

  • Robustness Comparison with SOTA methods (bounding box only) on VOT2020.

    VOT2020

🚀 Strong Performance and Comparable Speed

  • Performance on Benchmarks

    Performance

  • Radar Analysis on LaSOT and LaSOT_ext

    LaSOT_Radar LaSOT_ext_Radar

  • Speed, MACs, Params (Test on 1080Ti)

    Speed

📖 Install the environment

Use the Anaconda

conda create -n romtrack python=3.8
conda activate romtrack
bash install_pytorch.sh

📖 Data Preparation

Put the tracking datasets in ./data. It should look like:

${ROMTrack_ROOT}
 -- data
     -- lasot
         |-- airplane
         |-- basketball
         |-- bear
         ...
     -- lasot_ext
         |-- atv
         |-- badminton
         |-- cosplay
         ...
     -- got10k
         |-- test
         |-- train
         |-- val
     -- coco
         |-- annotations
         |-- train2017
     -- trackingnet
         |-- TRAIN_0
         |-- TRAIN_1
         ...
         |-- TRAIN_11
         |-- TEST

📖 Set project paths

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

📖 Train ROMTrack

Training with multiple GPUs using DDP. More details of other training settings can be found at tracking/train_romtrack.sh

bash tracking/train_romtrack.sh

📖 Test and evaluate ROMTrack on benchmarks

  • LaSOT/LaSOT_ext/GOT10k-test/TrackingNet/OTB100/UAV123/NFS30.
    • More details of test settings can be found at tracking/test_romtrack.sh
bash tracking/test_romtrack.sh
  • VOT2020. Current version is vot-toolkit(==0.5.3) and vot-trax(==3.0.3).
    • Take ROMTrack-Large-384 below as an example.
### Evaluate ROMTrack-Large-384 with AlphaRefine
vot evaluate --workspace ./external/vot2020/ROMTrack_large_384 ROMTrack_large_384_AR
vot analysis --nocache --workspace ./external/vot2020/ROMTrack_large_384 ROMTrack_large_384_AR

### Evaluate ROMTrack-Large-384 without AlphaRefine
vot evaluate --workspace ./external/vot2020/ROMTrack_large_384 ROMTrack_large_384
vot analysis --nocache --workspace ./external/vot2020/ROMTrack_large_384 ROMTrack_large_384

📖 Compute FLOPs/Params and test speed

bash tracking/profile_romtrack.sh

📖 Visualization

We provide attention maps and feature maps for several sequences on LaSOT. Detailed analysis can be found in our paper.

Visualization

🔖 Acknowledgments

  • Thanks for STARK, PyTracking and MixFormer Library, which helps us to quickly implement our ideas and test our performances.
  • Our implementation of the ViT is modified from the Timm repo.

📝 Citation

If our work is useful for your research, please feel free to star ⭐ and cite our paper:

@InProceedings{Cai_2023_ICCV,
    author    = {Cai, Yidong and Liu, Jie and Tang, Jie and Wu, Gangshan},
    title     = {Robust Object Modeling for Visual Tracking},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {9589-9600}
}