/Open-GroundingDino

This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.

Primary LanguagePythonMIT LicenseMIT

Open GroundingDino

This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection by Zuwei Long and Wei Li

You can use this code to fine-tune a model on your own dataset, or start pretraining a model from scratch.


Supported Features

Official release version The Version We Replicated
Inference
Train (Objecet Detection data)
Train (Grounding data)
Slurm multi-machine support
Training acceleration strategy

Setup

We test our models under python=3.7.11,pytorch=1.11.0,cuda=11.3. Other versions might be available as well.

  1. Clone the GroundingDINO repository from GitHub.
git clone https://github.com/longzw1997/Open-GroundingDino.git
  1. Change the current directory to the GroundingDINO folder.
cd Open-GroundingDino/
  1. Install the required dependencies.
pip install -r requirements.txt 
cd models/GroundingDINO/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..
  1. Download pre-trained model weights.
mkdir weights
cd weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..
  1. Download BERT as the language model.
mkdir bert_weights
cd bert_weights
wget -q  https://drive.google.com/drive/folders/1F9cEimkvt_mFD_IKDrmdH6mlpi9gkGHw?usp=drive_link
cd ..

Dataset

mixed dataset
{
  "train": [
    {
      "root": "path/V3Det/",
      "anno": "path/V3Det/annotations/v3det_2023_v1_all_odvg.jsonl",
      "label_map": "path/V3Det/annotations/v3det_label_map.json",
      "dataset_mode": "odvg"
    },
    {
      "root": "path/LVIS/train2017/",
      "anno": "path/LVIS/annotations/lvis_v1_train_odvg.jsonl",
      "label_map": "path/LVIS/annotations/lvis_v1_train_label_map.json",
      "dataset_mode": "odvg"
    },
    {
      "root": "path/Objects365/train/",
      "anno": "path/Objects365/objects365_train_odvg.json",
      "label_map": "path/Objects365/objects365_label_map.json",
      "dataset_mode": "odvg"
    },
    {
      "root": "path/coco_2017/train2017/",
      "anno": "path/coco_2017/annotations/coco2017_train_odvg.jsonl",
      "label_map": "path/coco_2017/annotations/coco2017_label_map.json",
      "dataset_mode": "odvg"
    },
    {
      "root": "path/GRIT-20M/data/",
      "anno": "path/GRIT-20M/anno/grit_odvg_620k.jsonl",
      "dataset_mode": "odvg"
    }, 
    {
      "root": "path/flickr30k/images/flickr30k_images/",
      "anno": "path/flickr30k/annotations/flickr30k_entities_odvg_158k.jsonl",
      "dataset_mode": "odvg"
    }
  ],
  "val": [
    {
      "root": "path/coco_2017/val2017",
      "anno": "config/instances_val2017.json",
      "label_map": null,
      "dataset_mode": "coco"
    }
  ]
}
example for odvg dataset
{"filename": "000000391895.jpg", "height": 360, "width": 640, "detection": {"instances": [{"bbox": [359.17, 146.17, 471.62, 359.74], "label": 3, "category": "motorcycle"}, {"bbox": [339.88, 22.16, 493.76, 322.89], "label": 0, "category": "person"}, {"bbox": [471.64, 172.82, 507.56, 220.92], "label": 0, "category": "person"}, {"bbox": [486.01, 183.31, 516.64, 218.29], "label": 1, "category": "bicycle"}]}}
{"filename": "000000522418.jpg", "height": 480, "width": 640, "detection": {"instances": [{"bbox": [382.48, 0.0, 639.28, 474.31], "label": 0, "category": "person"}, {"bbox": [234.06, 406.61, 454.0, 449.28], "label": 43, "category": "knife"}, {"bbox": [0.0, 316.04, 406.65, 473.53], "label": 55, "category": "cake"}, {"bbox": [305.45, 172.05, 362.81, 249.35], "label": 71, "category": "sink"}]}}

{"filename": "014127544.jpg", "height": 400, "width": 600, "grounding": {"caption": "Homemade Raw Organic Cream Cheese for less than half the price of store bought! It's super easy and only takes 2 ingredients!", "regions": [{"bbox": [5.98, 2.91, 599.5, 396.55], "phrase": "Homemade Raw Organic Cream Cheese"}]}}
{"filename": "012378809.jpg", "height": 252, "width": 450, "grounding": {"caption": "naive : Heart graphics in a notebook background", "regions": [{"bbox": [93.8, 47.59, 126.19, 77.01], "phrase": "Heart graphics"}, {"bbox": [2.49, 1.44, 448.74, 251.1], "phrase": "a notebook background"}]}}

Config

config/cfg_odvg.py                   # for backbone, batch size, LR, freeze layers, etc.
config/datasets_mixed_odvg.json      # support mixed dataset for both OD and VG

Training

  • Before starting the training, you need to modify the ''config/datasets_vg_example.json'' according to ''data_format.md''
  • The evaluation code defaults to using coco_val2017 for evaluation. If you are evaluating with your own test set, you need to convert the test data to coco format (not the ovdg format in data_format.md), and modify the config to set use_coco_eval = False. (The COCO dataset has 80 classes used for training but 90 categories in total, so there is a built-in mapping in the code.)
#train on slrum:
bash train_multi_node.sh  ${PARTITION} ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}

# e.g.  check train_multi_node.sh for more details
# bash train_multi_node.sh v100_32g 32 config/cfg_odvg.py config/datasets_mixed_odvg.json ./logs
# bash train_multi_node.sh v100_32g 8 config/cfg_coco.py config/datasets_od_example.json ./logs
# bash train_multi_node.sh v100_32g 8 config/cfg_odvg.py config/datasets_vg_example.json ./logs



#train on dist:

bash train_dist.sh  ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}


# e.g.  check train_multi_node.sh for more details
# bash train_multi_node.sh  8 config/cfg_odvg.py config/datasets_mixed_odvg.json ./logs
# bash train_multi_node.sh  8 config/cfg_coco.py config/datasets_od_example.json ./logs
# bash train_multi_node.sh  8 config/cfg_odvg.py config/datasets_vg_example.json ./logs


#eval:
bash test.sh  ${PARTITION} ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}

# e.g.  check train_multi_node.sh for more details
# bash train_multi_node.sh v100_32g 32 config/cfg_odvg.py config/datasets_mixed_odvg.json ./logs
# bash train_multi_node.sh v100_32g 8 config/cfg_coco.py config/datasets_od_example.json ./logs
# bash train_multi_node.sh v100_32g 8 config/cfg_odvg.py config/datasets_vg_example.json ./logs

Results and Models

Name Backbone Style Pretrain data mAP on COCO Checkpoint Config log
1 GroundingDINO-T (offical) Swin-T zero-shot O365,GoldG,Cap4M 48.4 (zero-shot) GitHub link link link
2 GroundingDINO-T (finetune) Swin-T use coco finetune O365,GoldG,Cap4M 57.3 (fine-tune) GitHub link link link
3 GroundingDINO-T (pretrain) Swin-T zero-shot COCO,Objects365,LIVS,V3Det,GRIT-200K,Flickr30k (total 1.8M) 55.1 (zero-shot) GitHub link link link
GRIT-200K generated by GLIP and spaCy

Contact

  • longzuwei at sensetime.com
  • liwei1 at sensetime.com

Any discussions, suggestions and questions are welcome!

Acknowledgments

Provided codes were adapted from:

Citation

@misc{Open Grounding Dino,
  author = {Zuwei Long,Wei Li},
  title = {Open Grounding Dino:The third party implementation of the paper Grounding DINO},
  howpublished = {\url{https://github.com/longzw1997/Open-GroundingDino}},
  year = {2023}
}

Feel free to contact me if you have any suggestions or questions, issues are welcome, create a PR if you find any bugs or you want to contribute.