V3Det: Vast Vocabulary Visual Detection Dataset

Jiaqi Wang*, Pan Zhang*, Tao Chu*, Yuhang Cao*,
Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin
(* equal contribution)
Accepted to ICCV 2023 (Oral)

Paper, Dataset

Codebase

Object Detection

mmdetection: https://github.com/V3Det/mmdetection-V3Det/tree/main/configs/v3det
Detectron2： https://github.com/V3Det/Detectron2-V3Det

Open Vocabulary Detection (OVD)

Detectron2： https://github.com/V3Det/Detectron2-V3Det

Data Format

The data includes a Train Set, a Val Set, and a Test Set, comprising 13,204 categories.

Split	Images	BBoxes
Train Set	183,354	1,357,377
Val Set	29,821	220,429
Test Set	29,863	219,012
Train Set OVD (Base Class)	132,437	836,203

The 13,204 categories are split into 6709 Base Class and 6495 Novel Class for OVD tasks. For each of the 13,204 categories, we prepare an exemplar image and detailed descriptions from various resources (human experts, ChatGPT, GPT4, and GPT4V).

Base Class	Novel Class	All Class
6709	6495	13204

The Train Set OVD (Base Class) is a subset of train set that only keeps the annotations of base classes, which is prepared for OVD (Open-Vocubalary Detection) tasks. Images without any annotations after filtering out novel annotations are removed. It is perpared for OVD (Open-Vocubalary Detection) tasks.

Split	Images	BBoxes
Train Set	183,354	1,357,377
Train Set OVD (Base Class)	132,437	836,203

The data organization is:

V3Det/
    images/
        <category_node>/
            |────<image_name>.png
            ...
        ...
    test/
        |────<image_name>.png
        ...
    exemplar_images/
        |────<category_id>.jpg
        ...
    annotations/
        |────v3det_2023_v1_category_tree.json       # Category tree
        |────category_name_13204_v3det_2023_v1.txt  # Category name
        |────v3det_2023_v1_train.json               # Train set
        |────v3det_2023_v1_train_ovd_base.json      # Open vocabulary detection train set
        |────v3det_2023_v1_val.json                 # Validation set
        |────v3det_2023_v1_test_image_info.json     # Image information of test set

Annotation Files

Train/Val

The annotation files are provided in dictionary format and contain the keywords "images," "categories," and "annotations."

images : store a list containing image information, where each element is a dictionary representing an image.

    file_name            # The relative image path, eg. images/n07745046/21_371_29405651261_633d076053_c.jpg.
    height               # The height of the image
    width                # The width of the image
    id                   # Unique identifier of the image.

categories : store a list containing category information, where each element is a dictionary representing a category.

    name                 # English name of the category.
    name_zh              # Chinese name of the category.
    cat_info             # The format for the description information of categories is a list.
    cat_info_gpt         # The format for the description information of categories generated by ChatGPT is a list.
    cat_info_gpt4        # The format for the description information of categories generated by GPT4.
    cat_info_gpt4v       # The format for the description information of categories generated by GPT4-V.
    novel                # For open-vocabulary detection, indicate whether the current category belongs to the 'novel' category.
    id                   # Unique identifier of the category.
    exemplar_image       # Exemplar image of the category.

annotations : store a list containing annotation information, where each element is a dictionary representing a bounding box annotation.

    image_id             # The unique identifier of the image where the bounding box is located.
    category_id          # The unique identifier of the category corresponding to the bounding box.
    bbox                 # The coordinates of the bounding box, in the format [x, y, w, h], representing the top-left corner coordinates and the width and height of the box.
    iscrowd              # Whether the bounding box is a crowd box.
    area                 # The area of the bounding box

Category Tree

The category tree stores information about dataset category mappings and relationships in dictionary format.

    categoryid2treeid    # Unique identifier of node in the category tree corresponding to the category identifier in dataset
    id2name              # English name corresponding to each node in the category tree
    id2name_zh           # Chinese name corresponding to each node in the category tree
    id2desc              # English description corresponding to each node in the category tree
    id2desc_zh           # Chinese description corresponding to each node in the category tree
    id2synonym_list      # List of synonyms corresponding to each node in the category tree
    id2center_synonym    # Center synonym corresponding to each node in the category tree
    father2child         # All direct child categories corresponding to each node in the category tree
    child2father         # All direct parent categories corresponding to each node in the category tree
    ancestor2descendant  # All descendant nodes corresponding to each node in the category tree
    descendant2ancestor  # All ancestor nodes corresponding to each node in the category tree

Image Download

Run the command to crawl the train and val images. By default, the images will be stored in the './V3Det/' directory.

python v3det_image_download.py

If you want to change the storage location, you can specify the desired folder by adding the option '--output_folder' when executing the script.

python v3det_image_download.py --output_folder our_folder

Run the command to crawl the test images.

python v3det_test_image_download.py [--output_folder our_folder]

Run the command to crawl the exemplar images.

python v3det_exemplar_image_download.py [--output_folder our_folder]

Category Tree Visualization

Run the command and then select dataset path path/to/V3Det to visualize the category tree.

python v3det_visualize_tree.py

Please refer to the TreeUI Operation Guide for more information.

Evaluation

We provide evaluation code here. To evaluate the model, you need

Step 1. Install Requirements

pip install pycocotools, tqdm
pip install openmim
mim install mmengine

Step 2. Format Results

Please format your detection result into COCO JSON format

Step 3. Evaluate

Run the python script:python eval_v3det.py dt_json_path

License:

V3Det Images: Around 90% images in V3Det were selected from the Bamboo Dataset, sourced from the Flickr website. The remaining 10% were directly crawled from the Flickr. We do not own the copyright of the images. Use of the images must abide by the Flickr Terms of Use. We only provide lists of image URLs without redistribution.
V3Det Annotations: The V3Det annotations, the category relationship tree, and related tools are licensed under a Creative Commons Attribution 4.0 License (allow commercial use).

Citation

@inproceedings{wang2023v3det,
      title = {V3Det: Vast Vocabulary Visual Detection Dataset}, 
      author = {Wang, Jiaqi and Zhang, Pan and Chu, Tao and Cao, Yuhang and Zhou, Yujie and Wu, Tong and Wang, Bin and He, Conghui and Lin, Dahua},
      booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
      month = {October},
      year = {2023}
}

V3Det/V3Det