Single/Multiple Object Tracking and Segmentation

Codes and comparison of recent single/multiple object tracking and segmentation.

News

💥 CNNInMo/TransInMo is accepted by IJCAI2022.

💥 CSTrack is accepted by IEEE TIP.

💥 OMC is accepted by AAAI2022. The training and testing code has been released in this codebase.

💥 AutoMatch is accepted by ICCV2021. The training and testing code has been released in this codebase.

💥 CSTrack ranks 5/4000 at Tianchi Global AI Competition.

💥 Ocean is accepted by ECCV2020. [OceanPlus] is accepted by IEEE TIP.

💥 SiamDW is accepted by CVPR2019 and selected as oral presentation.

Supported Trackers (SOT and MOT)

Single-Object Tracking (SOT)

Multi-Object Tracking (MOT)

Results Comparison

Comparison

Branches

SOT (or master): for our SOT trackers
MOT: for our MOT trackers
v0: old codebase supporting OceanPlus and TensorRT testing.

Please clone the branch to your needs.

Structure

experiments: training and testing settings
demo: figures for readme
dataset: testing dataset
data: training dataset
lib: core scripts for all trackers
snapshot: pre-trained models
pretrain: models trained on ImageNet (for training)
tracking: training and testing interface

$SOTS
|—— experimnets
|—— lib
|—— snapshot
  |—— xxx.model
|—— dataset
  |—— VOT2019.json 
  |—— VOT2019
     |—— ants1...
  |—— VOT2020
     |—— ants1...
|—— ...

Performance

_Model	_OTB2015	_GOT10K	_LaSOT	_TNL2K	_TrackingNet	_NFS30	_TOTB	_VOT2019	_TC128	_UAV123
_SiamDW	_0.670	_0.429	_0.386	_0.348	_61.1	_0.521	_0.500	_0.241	_0.583	_0.536
_Ocean	_0.676	_0.615	_0.517	_0.421	_69.2	_0.553	_0.638	_0.323	_0.585	_0.621
_AutoMatch	_0.714	_0.652	_0.583	_0.472	_76.0	_0.606	_0.668	_0.322	_0.634	_0.644
_CNNInMo	_0.703	_-	_0.539	_0.422	_72.1	_0.560	_-	_-	_-	_0.629
_TransInMo	_0.711	_-	_0.657	_0.520	_81.7	_0.668	_-	_-	_-	_0.690

Tracker Details

CNNInMo/TransInMo [IJCAI2022]

[Paper] [Raw Results] [Training and Testing Tutorial]
CNNInMo/TransInMo introduces a novel mechanism that conducts branch-wise interactions inside the visual tracking backbone network (InBN) via the proposed general interaction modeler (GIM). We show that both CNN and Transformer backbones can benefit from InBN, with which more robust feature representation can be learned. Our method achieves compelling tracking performance by applying the backbones to Siamese tracking.

OMC [AAAI2022]

[Paper] [Training and Testing Tutorial]
OMC introduces a double-check mechanism to make the "fake background" be tracked again. Specifically, we design a re-check network as the auxiliary to initial detections. If the target does not exist in the first-check predictions (i.e., the results of object detector), as a potential misclassified target, it has a chance to be restored by the re-check network, which searches targets through mining temporal cues. Note that, the re-check network innovatively expands the role of ID embedding from data association to motion forecasting by effectively propagating previous tracklets to the current frame with a small overhead. Even with multiple tracklets, our re-check network can still propagate with one forward pass by a simple matrix multiplication. Building on a strong baseline CSTrack, we construct a new one-shot tracker and achieve favorable gains.

AutoMatch [ICCV2021]

[Paper] [Raw Results] [Training and Testing Tutorial] [Demo]
AutoMatch replaces the essence of Siamese tracking, i.e. the cross-correlation and its variants, to a learnable matching network. The underlying motivation is that heuristic matching network design relies heavily on expert experience. Moreover, we experimentally find that one sole matching operator is difficult to guarantee stable tracking in all challenging environments. In this work, we introduce six novel matching operators from the perspective of feature fusion instead of explicit similarity learning, namely Concatenation, Pointwise-Addition, Pairwise-Relation, FiLM, Simple-Transformer and Transductive-Guidance, to explore more feasibility on matching operator selection. The analyses reveal these operators' selective adaptability on different environment degradation types, which inspires us to combine them to explore complementary features. We propose binary channel manipulation (BCM) to search for the optimal combination of these operators.

Ocean [ECCV2020]

[Paper] [Raw Results] [Training and Testing Tutorial] [Demo]

Ocean proposes a general anchor-free based tracking framework. It includes a pixel-based anchor-free regression network to solve the weak rectification problem of RPN, and an object-aware classification network to learn robust target-related representation. Moreover, we introduce an effective multi-scale feature combination module to replace heavy result fusion mechanism in recent Siamese trackers. This work also serves as the baseline model of OceanPlus. An additional TensorRT toy demo is provided in this repo.

SiamDW [CVPR2019]

[Paper] [Raw Results] [Training and Testing Tutorial] [Demo]
SiamDW is one of the pioneering work using deep backbone networks for Siamese tracking framework. Based on sufficient analysis on network depth, output size, receptive field and padding mode, we propose guidelines to build backbone networks for Siamese tracker. Several deeper and wider networks are built following the guidelines with the proposed CIR module.

OceanPlus [IEEE TIP]

[Paper] [Raw Results] [Training and Testing Tutorial] [Demo]
Official implementation of the OceanPlus tracker. It proposes an attention retrieval network (ARN) to perform soft spatial constraints on backbone features. Concretely, we first build a look-up-table (LUT) with the ground-truth mask in the starting frame, and then retrieve the LUT to obtain a target-aware attention map for suppressing the negative influence of background clutter. Furthermore, we introduce a multi-resolution multi-stage segmentation network (MMS) to ulteriorly weaken responses of background clutter by reusing the predicted mask to filter backbone features.

CSTrack [Arxiv now]

[Paper] [Training and Testing Tutorial] [Demo]
CSTrack proposes a strong ReID based one-shot MOT framework. It includes a novel cross-correlation network that can effectively impel the separate branches to learn task-dependent representations, and a scale-aware attention network that learns discriminative embeddings to improve the ReID capability. This work also provides an analysis of the weak data association ability in one-shot MOT methods. Our improvements make the data association ability of our one-shot model is comparable to two-stage methods while running more faster.

If you are interested in our work or have any questions, please contact me at 201921060415@std.uestc.edu.cn.

Other trackers, coming soon ...

References

https://github.com/StrangerZhang/pysot-toolkit
...

neverstoplearn/SOTS