This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection by Zuwei Long and Wei Li
You can use this code to fine-tune a model on your own dataset, or start pretraining a model from scratch.
Official release version | The Version We Replicated | |
---|---|---|
Inference | ✔ | ✔ |
Train (Objecet Detection data) | ✖ | ✔ |
Train (Grounding data) | ✖ | ✔ |
Slurm multi-machine support | ✖ | ✔ |
Training acceleration strategy | ✖ | ✔ |
We test our models under python=3.7.11,pytorch=1.11.0,cuda=11.3
. Other versions might be available as well.
- Clone the GroundingDINO repository from GitHub.
git clone https://github.com/longzw1997/Open-GroundingDino.git
- Change the current directory to the GroundingDINO folder.
cd Open-GroundingDino/
- Install the required dependencies.
pip install -r requirements.txt
cd models/GroundingDINO/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..
- Download pre-trained model weights.
mkdir weights
cd weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..
- Download BERT as the language model.
mkdir bert_weights
cd bert_weights
wget -q https://drive.google.com/drive/folders/1F9cEimkvt_mFD_IKDrmdH6mlpi9gkGHw?usp=drive_link
cd ..
- Dataset Format -> data format
- See datasets_mixed_odvg.json | coco2odvg.py | grit2odvg for more details
mixed dataset
{
"train": [
{
"root": "path/V3Det/",
"anno": "path/V3Det/annotations/v3det_2023_v1_all_odvg.jsonl",
"label_map": "path/V3Det/annotations/v3det_label_map.json",
"dataset_mode": "odvg"
},
{
"root": "path/LVIS/train2017/",
"anno": "path/LVIS/annotations/lvis_v1_train_odvg.jsonl",
"label_map": "path/LVIS/annotations/lvis_v1_train_label_map.json",
"dataset_mode": "odvg"
},
{
"root": "path/Objects365/train/",
"anno": "path/Objects365/objects365_train_odvg.json",
"label_map": "path/Objects365/objects365_label_map.json",
"dataset_mode": "odvg"
},
{
"root": "path/coco_2017/train2017/",
"anno": "path/coco_2017/annotations/coco2017_train_odvg.jsonl",
"label_map": "path/coco_2017/annotations/coco2017_label_map.json",
"dataset_mode": "odvg"
},
{
"root": "path/GRIT-20M/data/",
"anno": "path/GRIT-20M/anno/grit_odvg_620k.jsonl",
"dataset_mode": "odvg"
},
{
"root": "path/flickr30k/images/flickr30k_images/",
"anno": "path/flickr30k/annotations/flickr30k_entities_odvg_158k.jsonl",
"dataset_mode": "odvg"
}
],
"val": [
{
"root": "path/coco_2017/val2017",
"anno": "config/instances_val2017.json",
"label_map": null,
"dataset_mode": "coco"
}
]
}
example for odvg dataset
{"filename": "000000391895.jpg", "height": 360, "width": 640, "detection": {"instances": [{"bbox": [359.17, 146.17, 471.62, 359.74], "label": 3, "category": "motorcycle"}, {"bbox": [339.88, 22.16, 493.76, 322.89], "label": 0, "category": "person"}, {"bbox": [471.64, 172.82, 507.56, 220.92], "label": 0, "category": "person"}, {"bbox": [486.01, 183.31, 516.64, 218.29], "label": 1, "category": "bicycle"}]}}
{"filename": "000000522418.jpg", "height": 480, "width": 640, "detection": {"instances": [{"bbox": [382.48, 0.0, 639.28, 474.31], "label": 0, "category": "person"}, {"bbox": [234.06, 406.61, 454.0, 449.28], "label": 43, "category": "knife"}, {"bbox": [0.0, 316.04, 406.65, 473.53], "label": 55, "category": "cake"}, {"bbox": [305.45, 172.05, 362.81, 249.35], "label": 71, "category": "sink"}]}}
{"filename": "014127544.jpg", "height": 400, "width": 600, "grounding": {"caption": "Homemade Raw Organic Cream Cheese for less than half the price of store bought! It's super easy and only takes 2 ingredients!", "regions": [{"bbox": [5.98, 2.91, 599.5, 396.55], "phrase": "Homemade Raw Organic Cream Cheese"}]}}
{"filename": "012378809.jpg", "height": 252, "width": 450, "grounding": {"caption": "naive : Heart graphics in a notebook background", "regions": [{"bbox": [93.8, 47.59, 126.19, 77.01], "phrase": "Heart graphics"}, {"bbox": [2.49, 1.44, 448.74, 251.1], "phrase": "a notebook background"}]}}
config/cfg_odvg.py # for backbone, batch size, LR, freeze layers, etc.
config/datasets_mixed_odvg.json # support mixed dataset for both OD and VG
- Before starting the training, you need to modify the ''config/datasets_vg_example.json'' according to ''data_format.md''
- The evaluation code defaults to using coco_val2017 for evaluation. If you are evaluating with your own test set, you need to convert the test data to coco format (not the ovdg format in data_format.md), and modify the config to set use_coco_eval = False. (The COCO dataset has 80 classes used for training but 90 categories in total, so there is a built-in mapping in the code.)
#train on slrum:
bash train_multi_node.sh ${PARTITION} ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
# e.g. check train_multi_node.sh for more details
# bash train_multi_node.sh v100_32g 32 config/cfg_odvg.py config/datasets_mixed_odvg.json ./logs
# bash train_multi_node.sh v100_32g 8 config/cfg_coco.py config/datasets_od_example.json ./logs
# bash train_multi_node.sh v100_32g 8 config/cfg_odvg.py config/datasets_vg_example.json ./logs
#train on dist:
bash train_dist.sh ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
# e.g. check train_multi_node.sh for more details
# bash train_multi_node.sh 8 config/cfg_odvg.py config/datasets_mixed_odvg.json ./logs
# bash train_multi_node.sh 8 config/cfg_coco.py config/datasets_od_example.json ./logs
# bash train_multi_node.sh 8 config/cfg_odvg.py config/datasets_vg_example.json ./logs
#eval:
bash test.sh ${PARTITION} ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
# e.g. check train_multi_node.sh for more details
# bash train_multi_node.sh v100_32g 32 config/cfg_odvg.py config/datasets_mixed_odvg.json ./logs
# bash train_multi_node.sh v100_32g 8 config/cfg_coco.py config/datasets_od_example.json ./logs
# bash train_multi_node.sh v100_32g 8 config/cfg_odvg.py config/datasets_vg_example.json ./logs
Name | Backbone | Style | Pretrain data | mAP on COCO | Checkpoint | Config | log | |
---|---|---|---|---|---|---|---|---|
1 | GroundingDINO-T (offical) | Swin-T | zero-shot | O365,GoldG,Cap4M | 48.4 (zero-shot) | GitHub link | link | link |
2 | GroundingDINO-T (finetune) | Swin-T | use coco finetune | O365,GoldG,Cap4M | 57.3 (fine-tune) | GitHub link | link | link |
3 | GroundingDINO-T (pretrain) | Swin-T | zero-shot | COCO,Objects365,LIVS,V3Det,GRIT-200K,Flickr30k (total 1.8M) | 55.1 (zero-shot) | GitHub link | link | link |
- longzuwei at sensetime.com
- liwei1 at sensetime.com
Any discussions, suggestions and questions are welcome!
Provided codes were adapted from:
@misc{Open Grounding Dino,
author = {Zuwei Long,Wei Li},
title = {Open Grounding Dino:The third party implementation of the paper Grounding DINO},
howpublished = {\url{https://github.com/longzw1997/Open-GroundingDino}},
year = {2023}
}
Feel free to contact me if you have any suggestions or questions, issues are welcome, create a PR if you find any bugs or you want to contribute.