VastTrack: Vast Category Visual Object Tracking
Liang Peng*, Junyuan Gao*, Xinran Liu*, Weihong Li*, Shaohua Dong*, Zhipeng Zhang, Heng Fan
(*: equal contribution;
[arXiv
] [Matlab Code
] [Python Code
]
Figure: We introduce VastTrack, a new large-scale benchmark that aims to facilitate general single object tracking with abundant object categories (over 2.1K classes) and videos (over 50K sequences). Here we show partial target trajectories in videos. Please notice that, only a very small part of categories and videos are demonstrated.
- Vast Object Category
- VastTrack contains 2,115 object classes, largely surpassing object categories of existing benchmarks
- Larger-scale Benchmark
- VastTrack comprises 50,610 videos with 4.2M frames, making it the largest regarding video number
- Rich Linguistic Description
- VastTrack provides a linguistic description for each sequence, collecting more than 50K descriptions
- High-quality and Dense Annotation
- VastTrack offers manual per-frame annotations for videos, building a high-quality platform for tracking
Figure: Visualization of several annotation examples along with the linguistic descriptions in the proposed VastTrack.
Figure: Overall evaluation of representative SOTA trackers from different years on VastTrack using PRE/NPRE/SUC.
Figure: Attribute-based evaluation of different tracking algorithms on VastTrack using SUC (more in the paper).
Figure: Qualitative results of eight representative trackers on different sequences containing different challenges.
More experimental results with analysis can be found in the paper.
Due to the large data size, we split VastTrack into multiple Zip
files. Each file has the following organization:
part-1.zip
├── class-1
│ └── video-1
│ ├── imgs
│ ├── nlp.txt
│ └── Groundtruth.txt
│ └── video-2
│ ├── imgs
│ ├── nlp.txt
│ └── Groundtruth.txt
| ...
└── class-2
| ...
part-2.zip
├── class-k
| ...
...
You need to download all the zips files using the provided links below for a full version of VastTrack.
In each video folder, we provide the frames of the video in the imgs/
sub-folder, bounding box annotations in the Groundtruth.txt
file, and linguistic description in the nlp.txt
file. The format of the bounding box is as follows: [x, y, width, height]
.
Below are the downloading links of VastTrack. We offer two ways, OneDrive
and Baidu Cloud Drive
, to download the data.
-
OneDrive
-
Baidu Cloud Drive
To validate if the downloaded files are complete or not, please refer to MD5 files
(MD5-Training and MD5-Test).
Note: The training set of VastTrack contains 82 Zip files in total, and the category corresponding to each compressed package is specified in a JSON file. The test set consists of 15 Zip packages.
The meta data of VastTrack can be downloaded on OneDrive at here.
We provide two variats of evaluation toolkit for Matlab and Python users.
The video sequences in VastTrack are collected from YouTube (under Creative Commons Attribution 4.0 License) as it is currently the largest the video platform and many videos come from the real world. We provide VastTrack for non-commercial research purposes only and are not responsible for the content of these videos.
If you find our VastTrack useful, please consider giving it a star and citing it. Thanks!
@article{peng2024vasttrack,
title={VastTrack: Vast Category Visual Object Tracking},
author={Peng, Liang and Gao, Junyuan and Liu, Xinran and Li, Weihong and Dong, Shaohua and Zhang, Zhipeng and Fan, Heng and Zhang, Libo},
journal={arXiv preprint arXiv:2403.03493},
year={2024}
}