A brand logo detection system using DETR. (DeepLogo with Tensorflow Object Detection API is here)
DETR is a Transformer-based object detection model published by Facebook AI in 2020. Pytorch training code and pretrained models are also available on Github.
DeepLogo2 provides a training and inference environment for creating brand logo detection models using DETR.
DeepLogo2 use the flickr logos 27 dataset. The flickr logos 27 dataset contains 27 classes of brand logo images downloaded from Flickr. The brands included in the dataset are: Adidas, Apple, BMW, Citroen, Coca Cola, DHL, Fedex, Ferrari, Ford, Google, Heineken, HP, McDonalds, Mini, Nbc, Nike, Pepsi, Porsche, Puma, Red Bull, Sprite, Starbucks, Intel, Texaco, Unisef, Vodafone and Yahoo.
To fine-tuning DETR, the dataset is conveted to COCO format.
python preproc_annot.py
python flickr2coco.py --mode train --output_dir flickr_logos_27_dataset
python flickr2coco.py --mode test --output_dir flickr_logos_27_dataset
DeepLogo incorporates the DETR repository as a subtree, with the following changes for fine-tuning on the flickr logos 27 dataset.
Note: For code modifications for fine-tuning DETR, please refer to woctezuma/detr_fine_tune.md
-
Add custom dataset builder method(detr/datasets/flickr_logos_27.py, detr/datsets/
__init__
.py)def build(image_set, args): root = Path(args.coco_path) assert root.exists(), f'provided root path {root} does not exist' train_json = 'flickr_logos_27_train.json' test_json = 'flickr_logos_27_test.json' PATHS = { "train": (root / 'flickr_logos_27_dataset_images', root / train_json), "val": (root / 'flickr_logos_27_dataset_images', root / test_json), } img_folder, ann_file = PATHS[image_set] dataset = CocoDetection(img_folder, ann_file, transforms=make_coco_transforms(image_set), return_masks=args.masks) return dataset
def build_dataset(image_set, args): ... if args.dataset_file == 'flickr_logos_27': from .flickr_logos_27 import build as build_flickr_logos_27 return build_flickr_logos_27(image_set, args) raise ValueError(f'dataset {args.dataset_file} not supported')
-
Modify the
num_classes
to match the flickr logos 27 dataset(detr/models/detr.py)def build(args): # the `num_classes` naming here is somewhat misleading. # it indeed corresponds to `max_obj_id + 1`, where max_obj_id # is the maximum id for a class in your dataset. For example, # COCO has a max_obj_id of 90, so we pass `num_classes` to be 91. # As another example, for a dataset that has a single class with id 1, # you should pass `num_classes` to be 2 (max_obj_id + 1). # For more details on this, check the following discussion # https://github.com/facebookresearch/detr/issues/108#issuecomment-650269223 num_classes = 20 if args.dataset_file != 'coco' else 91 if args.dataset_file == "coco_panoptic": # for panoptic, we just add a num_classes that is large enough to hold # max_obj_id + 1, but the exact value doesn't really matter num_classes = 250 if args.dataset_file == 'flickr_logos_27': num_classes = 27 # max_obj_id: 26 ...
-
Delete the classification head and loading the state dict(delete_head_and_save.py, detr/main.py)
Get the pretrained weights with the following script, delete the head, and save it as new file.
python delete_and_save.py
Load the state dict at main.py
model_without_ddp.load_state_dict(checkpoint['model'], strict=False)
Reference: facebookresearch/detr#9 (comment)
To fine-tuning DETR on flickr logos 27 dataset:
python detr/main.py \
--dataset_file "flickr_logos_27" \
--coco_path "flickr_logos_27_dataset" \
--output_dir "outputs" \
--resume "detr-r50_no-class-head.pth" \
--epochs 100
It takes about 3 hours and 15 minutes with Google Colab Pro to run 100 epochs.
The DETR fine-tuning can be checked by running Train_DeepLogo2_by_detr.ipynb.