Colruyt products detection

This code trains a Faster RCNN architecture with a Resnet 50 backbone to learn to detect and categorize a set of 60 types of Colruyt products (plus the background). The code is written in Pytorch.

How to use

Install environment

To install the conda environment on which I developed the solution:

conda env create -f environment.yml

Then activate it:

conda activate gym_env

Prepare the data

This repository contains no images since it is too heavy to upload, however you can build the expected folder structure by following these steps:

under data folder, create an images folder and populate it with the images of the exercise, there should be exactly 31.000 images
under data folder, launch python3 separate_data.py, this script will move the images to 4 different folders:
- train/images
- test/images
- val/images
- unlabeled/images
That's it you are ready

Prepare the model

Create a model folder in the root of this repo and download the following model: https://drive.google.com/file/d/119RkfoVjpSrL2-fw_-8HpMvKKnF-joJx/view?usp=sharing

Generate testing results

The results should be already available under /data/test/result.csv in the format asked by Colruyt team. To re-generate the expected csv results file, launch export_test.py script, you need to configure the following parameter:

model_path: path to your trained model, if it does not exists, a pretrained COCO model is loaded (with poor results)

Training

To train the model, launch the training.py script, you need to configure the following parameters:

BATCH_SIZE: this value depends on the GPU memory available (default: 4)
EPOCHS: how long do you want to train (default: 200)
checkpoint_model: starts the training by loading the weights of this model (if it does not exists, a pretrained COCO model is loaded)
new_model: path where to store new model

NOTE: The training set has been extended by manually labeling a set of 300 extra images that were unlabeled and that did not belong to the test set. This has been done using the following repository: https://github.com/jsbroks/coco-annotator

Evaluation on validation set

A custom validation set of 100 difficult images has been labeled manually using the COCO-annotator repo. This validation does not contain images from the test set. You can assess your model by following this steps:

Under src, launch git clone https://github.com/philferriere/cocoapi.git
And install pycocotools with pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
In prepare_eval.py script, select your model under model_name variable
Launch python3 prepare_eval.py
A result is generated in the folder data/val/results
Go to src/cocoapi/PythonAPI/demos and move evaluation.ipynb notebook to this directory
Configure the variables annDir, annFile and resFile according to your machine and your model name
lauch evaluation.ipynb notebook

Visualize

To visualize the model on the validation set, launch visualize.py, you need to configure the following parameter:

model_path: path to your trained model, if it does not exists, a pretrained COCO model is loaded (with poor results)
Launch python3 training.py
Press q to go to the next image

Expected folder structure

.
├── data
│   ├── images
│   ├── info.docx
│   ├── separate_data.py
│   ├── test
│   │   ├── images
│   │   ├── result.csv
│   │   └── test.json
│   ├── train
│   │   ├── images
│   │   ├── train_info.json
│   │   ├── train.json
│   │   └── train_results.json
│   ├── unlabeled
│   │   ├── convert_unlabeled.py
│   │   ├── images
│   │   ├── train_info_ext.json
│   │   └── val_info_ext.json
│   └── val
│   ├── images
│   ├── results
│   ├── val_info.json
│   └── val.json
├── environment.yml
├── model
│   ├── fasterrcnn_resnet50_fpn_bb5_p3.pt
├── README.md
├── runs
└── src
├── cocoapi
├── ColruytDataset.py
├── export_test.py
├── prepare_eval.py
├── training.py
├── utils.py
└── visualize.py

denguir/colruyt-challenge