Colruyt products detection

This code trains a Faster RCNN architecture with a Resnet 50 backbone to learn to detect and categorize a set of 60 types of Colruyt products (plus the background). The code is written in Pytorch.

How to use

Install environment

To install the conda environment on which I developed the solution:

conda env create -f environment.yml

Then activate it:

conda activate gym_env

Prepare the data

This repository contains no images since it is too heavy to upload, however you can build the expected folder structure by following these steps:

  1. under data folder, create an images folder and populate it with the images of the exercise, there should be exactly 31.000 images
  2. under data folder, launch python3 separate_data.py, this script will move the images to 4 different folders:
    • train/images
    • test/images
    • val/images
    • unlabeled/images
  3. That's it you are ready

Prepare the model

Create a model folder in the root of this repo and download the following model: https://drive.google.com/file/d/119RkfoVjpSrL2-fw_-8HpMvKKnF-joJx/view?usp=sharing

Generate testing results

The results should be already available under /data/test/result.csv in the format asked by Colruyt team. To re-generate the expected csv results file, launch export_test.py script, you need to configure the following parameter:

  • model_path: path to your trained model, if it does not exists, a pretrained COCO model is loaded (with poor results)

Training

To train the model, launch the training.py script, you need to configure the following parameters:

  • BATCH_SIZE: this value depends on the GPU memory available (default: 4)
  • EPOCHS: how long do you want to train (default: 200)
  • checkpoint_model: starts the training by loading the weights of this model (if it does not exists, a pretrained COCO model is loaded)
  • new_model: path where to store new model

NOTE: The training set has been extended by manually labeling a set of 300 extra images that were unlabeled and that did not belong to the test set. This has been done using the following repository: https://github.com/jsbroks/coco-annotator

Evaluation on validation set

A custom validation set of 100 difficult images has been labeled manually using the COCO-annotator repo. This validation does not contain images from the test set. You can assess your model by following this steps:

  • Under src, launch git clone https://github.com/philferriere/cocoapi.git
  • And install pycocotools with pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
  • In prepare_eval.py script, select your model under model_name variable
  • Launch python3 prepare_eval.py
  • A result is generated in the folder data/val/results
  • Go to src/cocoapi/PythonAPI/demos and move evaluation.ipynb notebook to this directory
  • Configure the variables annDir, annFile and resFile according to your machine and your model name
  • lauch evaluation.ipynb notebook

Visualize

To visualize the model on the validation set, launch visualize.py, you need to configure the following parameter:

  • model_path: path to your trained model, if it does not exists, a pretrained COCO model is loaded (with poor results)
  • Launch python3 training.py
  • Press q to go to the next image

Expected folder structure

.
├── data
│   ├── images
│   ├── info.docx
│   ├── separate_data.py
│   ├── test
│   │   ├── images
│   │   ├── result.csv
│   │   └── test.json
│   ├── train
│   │   ├── images
│   │   ├── train_info.json
│   │   ├── train.json
│   │   └── train_results.json
│   ├── unlabeled
│   │   ├── convert_unlabeled.py
│   │   ├── images
│   │   ├── train_info_ext.json
│   │   └── val_info_ext.json
│   └── val
│   ├── images
│   ├── results
│   ├── val_info.json
│   └── val.json
├── environment.yml
├── model
│   ├── fasterrcnn_resnet50_fpn_bb5_p3.pt
├── README.md
├── runs
└── src
├── cocoapi
├── ColruytDataset.py
├── export_test.py
├── prepare_eval.py
├── training.py
├── utils.py
└── visualize.py