VastTrack: Vast Category Visual Object Tracking

VastTrack: Vast Category Visual Object Tracking
Liang Peng^*, Junyuan Gao^*, Xinran Liu^*, Weihong Li^*, Shaohua Dong^*, Zhipeng Zhang, Heng Fan^$\dagger$, Libo Zhang^$\dagger$
(*: equal contribution; $\dagger$: equal advising)
[arXiv] [Matlab Code] [Python Code]

Figure: We introduce VastTrack, a new large-scale benchmark that aims to facilitate general single object tracking with abundant object categories (over 2.1K classes) and videos (over 50K sequences). Here we show partial target trajectories in videos. Please notice that, only a very small part of categories and videos are demonstrated.

✨ Highlights

Vast Object Category
- VastTrack contains 2,115 object classes, largely surpassing object categories of existing benchmarks
Larger-scale Benchmark
- VastTrack comprises 50,610 videos with 4.2M frames, making it the largest regarding video number
Rich Linguistic Description
- VastTrack provides a linguistic description for each sequence, collecting more than 50K descriptions
High-quality and Dense Annotation
- VastTrack offers manual per-frame annotations for videos, building a high-quality platform for tracking

📷 Samples

Figure: Visualization of several annotation examples along with the linguistic descriptions in the proposed VastTrack.

🚩 Benchmarking

🔹 Overall Evaluation SOTA Trackers

Figure: Overall evaluation of representative SOTA trackers from different years on VastTrack using PRE/NPRE/SUC.

🔹 Attribute-based Evaluation

Figure: Attribute-based evaluation of different tracking algorithms on VastTrack using SUC (more in the paper).

🔹 Qualitative Evaluation

Figure: Qualitative results of eight representative trackers on different sequences containing different challenges.

More experimental results with analysis can be found in the paper.

🌐 Downloading VastTrack

🔹 Organization

Due to the large data size, we split VastTrack into multiple Zip files. Each file has the following organization:

part-1.zip
├── class-1
│   └── video-1
│       ├── imgs
│       ├── nlp.txt
│       └── Groundtruth.txt
│   └── video-2
│       ├── imgs
│       ├── nlp.txt
│       └── Groundtruth.txt
|       ...
└── class-2
|   ...
part-2.zip
├── class-k
|   ...
...

You need to download all the zips files using the provided links below for a full version of VastTrack.

🔹 Format of Each Video Sequence

In each video folder, we provide the frames of the video in the imgs/ sub-folder, bounding box annotations in the Groundtruth.txt file, and linguistic description in the nlp.txt file. The format of the bounding box is as follows: [x, y, width, height].

🔹 Downloading Links

Below are the downloading links of VastTrack. We offer two ways, OneDrive and Baidu Cloud Drive, to download the data.

OneDrive
- The downloading link for the training set is here.
- The downloading link for the test set is here.
Baidu Cloud Drive
- The downloading link for the training set is here (you may need the extraction code: qs2c).
- The downloading link for the test set is here (you may need the extraction code: Vast).

To validate if the downloaded files are complete or not, please refer to MD5 files (MD5-Training and MD5-Test).

Note: The training set of VastTrack contains 82 Zip files in total, and the category corresponding to each compressed package is specified in a JSON file. The test set consists of 15 Zip packages.

🔹 Meta Data

The meta data of VastTrack can be downloaded on OneDrive at here.

📏 Evaluation Toolkit

We provide two variats of evaluation toolkit for Matlab and Python users.

📝 License

The video sequences in VastTrack are collected from YouTube (under Creative Commons Attribution 4.0 License) as it is currently the largest the video platform and many videos come from the real world. We provide VastTrack for non-commercial research purposes only and are not responsible for the content of these videos.

🎈 Citation

If you find our VastTrack useful, please consider giving it a star and citing it. Thanks!

@article{peng2024vasttrack,
  title={VastTrack: Vast Category Visual Object Tracking},
  author={Peng, Liang and Gao, Junyuan and Liu, Xinran and Li, Weihong and Dong, Shaohua and Zhang, Zhipeng and Fan, Heng and Zhang, Libo},
  journal={arXiv preprint arXiv:2403.03493},
  year={2024}
}