A logo detection system using YOLOv5.
Brands want to understand who uses their products and how. One way to do so is to use Computer Vision algorithms to detect relevant pictures on social media. Object Detection is the method of detecting objects in images or videos. Our goal here was to train a model able to detect brand logos.
We developed our codes using Google Colab, and then we trained the largest ones on an Azure Virtual Machine:
To use the package, follow the guidline bellow:
- Install python version 3.8
- Install the latest version of PyTorch with Cuda 11.3 enabled from the official website
- Install the required packages:
$ git clone https://github.com/Kasrazn97/Logo_Detection
$ cd Logo_Detection
$ pip install -r requirments.txt
Alternative: You can also recreate the conda environment used for this project with the following steps:
- Clone the repository
cd Logo_Detection
conda env create -f environment.yml
conda activate yolonew
We used two model architectures from the YOLOv5 family : YOLOv5s, YOLOv5l. Experimenting with dataset preprocessing steps and fine-tuning procedures, we produced 5 different models and compared their performance.
We chose YoloV5 since it is state-of-the-art model and is considered to be one of the best in terms of speed and accuracy trade-off. Here the details of the two models we chose to train (you can find the details of all the YOLOv5 models here):
Model | size (pixels) |
mAPval 0.5:0.95 |
mAPval 0.5 |
Speed CPU b1 (ms) |
Speed V100 b1 (ms) |
Speed V100 b32 (ms) |
params (M) |
FLOPs @640 (B) |
---|---|---|---|---|---|---|---|---|
YOLOv5s | 640 | 37.2 | 56.0 | 98 | 6.4 | 0.9 | 7.2 | 16.5 |
YOLOv5l | 640 | 48.8 | 67.2 | 430 | 10.1 | 2.7 | 46.5 | 109.1 |
The raw dataset we deployed consists of images representing the following logos: Nike, Adidas, Under Armour, Puma, The North Face, Starbucks, Apple Inc., Mercedes-Benz, NFL, Coca-Cola, Chanel, Toyota, Pepsi, Hard Rock Cafè. We used Roboflow to convert dataset to a COCO format and apply preprocessing steps which include image resize and data augmentation. In particular we applied the following augmentations: rotation, blur, flip, shear, exposure, mosaic, crop (both at image and bounding box levels).
Our final models differ both in the input data used and the training steps applied.
- YOLOv5s (version 1): trained on the raw dataset to which we added augmentations. We kept the 10 backbone layers frozen and fine-tuned the rest. Since the model results were unsatisfatory, we manually cleaned the data by removing the poorly annotated images. Around 40k images used.
- YOLOv5s (version 2): cleaned dataset with augmentations (about 20k images in total). Again, we trained all layers except for the backbone.
- YOLOv5s (version 3): cleaned dataset with extra augmentation steps, for a total of around 60k images fine-tuned on model version 2 in the last step. Only 6 last layers were trained, thus keeping 18 frozen.
- YOLOv5s (version 4): combined dataset from step 2 and step 3. Tuning all the layers except for the backbone.
- YOLOv5l: combined dataset from step 2 and step 3, adding more augmentation steps. Images in the training and validation set summed up to around 90k. Again, we trained all the layers except for the backbone.
You can download the weitghs of all 5 models fomr here.
We used 2 different metrics to evaluate our model:
- IoU
- mAP
IoU, Intersection over Union, is an evaluation metric used to evaluate the goodness of an object detector by measuring the overlap between two bounding boxes. mAP, mean Average Precision, of the model measures the Average Precision (computed by calculating the AuC for a particular class) averaged over all the classes .
Here the average results for each model:
Model | mAPval 0.5 |
mAPval 0.5:0.95 |
---|---|---|
YOLOv5s - v1 | 0.598 | 0.364 |
YOLOv5s - v2 | 0.851 | 0.662 |
YOLOv5s - v3 | 0.846 | 0.563 |
YOLOv5s - v4 | 0.881 | 0.664 |
YOLOv5l | 0.943 | 0.713 |
YOLOv5s - v1 | YOLOv5s - v3 | YOLOv5l |
---|---|---|
Here the results for YOLOv5l for each logo:
Logo | mAPval 0.5 |
mAPval 0.5:0.95 |
IoU >50% confidence |
IoU >10% confidence |
---|---|---|---|---|
Adidas | 0.98 | 0.753 | 0.873 | 0.897 |
Apple Inc. | 0.981 | 0.761 | 0.896 | 0.902 |
Chanel | 0.981 | 0.678 | 0.797 | 0.873 |
Coca Cola | 0.886 | 0.619 | 0.786 | 0.853 |
Hard Rock Cafè | 0.957 | 0.743 | 0.859 | 0.883 |
Mercedes Benz | 0.984 | 0.789 | 0.915 | 0.924 |
NFL | 0.965 | 0.731 | 0.876 | 0.892 |
Nike | 0.959 | 0.706 | 0.868 | 0.889 |
Pepsi | 0.942 | 0.676 | 0.843 | 0.860 |
Puma | 0.925 | 0.667 | 0.793 | 0.865 |
Starbucks | 0.975 | 0.823 | 0.916 | 0.924 |
The North Face | 0.975 | 0.741 | 0.887 | 0.907 |
Toyota | 0.961 | 0.737 | 0.883 | 0.903 |
Under Armour | 0.977 | 0.71 | 0.867 | 0.880 |
Download the trained models results and their weights from here. Create a Folder called Assets inside project folder and put the downloaded folder inside it.
Logo_Detection|
|--Assets|
|--Models|
|--yolov5l_extra_cleanData
...
Train and Inference:
- Clone the repository on your local machine.
cd *Logo_Detection*
- Create a folder name Assets
cd *Assets*
- Create a folder name dataset
- Put your Yolo formated data based on the following structure:
--Assets|
|--dataset|
|--train|
| |--images
| |--labels
|--valid
| |--images
| |--labels
|--test
| |--images
| |--labels
Detection:
- Clone the repository on your local machine.
cd Logo_Detection
- Create a folder name Assets
cd Assets
- Create a folder name testnow
- Put all your images you want to do inference on under testnow folder
- Put the related your_data.yaml file on yolov5/data (see the example logos_yolo5.yaml)
cd yolov5
- Run the following command (you can change the model name or any other settings you want):
python train.py --batch-size 32 --weights yolov5s.pt --data your_data.yaml --epochs 50 --hyp hyp.finetune.yaml --freeze 10
For more information regarding the data prepration refer to: https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data
Once our algorithm has finished training, we can evaluate its performance on a test (you can download it from here, or retireve the image names and their prediction from test_results.csv) set. Follow the instructions provided in the points below to run the algorithm and retrieve the results of both image.jpg with a bounding box around the prediction, and its respective image.txt label describing the detected classes and their respective bounding box in a format (class_id, x_cen, y_cen, width, height, confidence):
- Open your terminal
- Activate the environment you installed the requirments.txt on
- Open detect_batch.sh with a text editor, and change the variable Modelname according to the specific model you're evaluating (exact names are specified inside detect_batch.sh).
- See the results under Assets/outputs/<Modelname> . There you're going to be provided with a folder containing all the images, and in that same folder there is going to be another folder called "labels" containing all the predicted labels.