This code trains a Faster RCNN architecture with a Resnet 50 backbone to learn to detect and categorize a set of 60 types of Colruyt products (plus the background). The code is written in Pytorch.
To install the conda environment on which I developed the solution:
conda env create -f environment.yml
Then activate it:
conda activate gym_env
This repository contains no images since it is too heavy to upload, however you can build the expected folder structure by following these steps:
- under data folder, create an images folder and populate it with the images of the exercise, there should be exactly 31.000 images
- under data folder, launch
python3 separate_data.py
, this script will move the images to 4 different folders:- train/images
- test/images
- val/images
- unlabeled/images
- That's it you are ready
Create a model folder in the root of this repo and download the following model: https://drive.google.com/file/d/119RkfoVjpSrL2-fw_-8HpMvKKnF-joJx/view?usp=sharing
The results should be already available under /data/test/result.csv in the format asked by Colruyt team. To re-generate the expected csv results file, launch export_test.py script, you need to configure the following parameter:
- model_path: path to your trained model, if it does not exists, a pretrained COCO model is loaded (with poor results)
To train the model, launch the training.py script, you need to configure the following parameters:
- BATCH_SIZE: this value depends on the GPU memory available (default: 4)
- EPOCHS: how long do you want to train (default: 200)
- checkpoint_model: starts the training by loading the weights of this model (if it does not exists, a pretrained COCO model is loaded)
- new_model: path where to store new model
NOTE: The training set has been extended by manually labeling a set of 300 extra images that were unlabeled and that did not belong to the test set. This has been done using the following repository: https://github.com/jsbroks/coco-annotator
A custom validation set of 100 difficult images has been labeled manually using the COCO-annotator repo. This validation does not contain images from the test set. You can assess your model by following this steps:
- Under src, launch
git clone https://github.com/philferriere/cocoapi.git
- And install pycocotools with
pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
- In prepare_eval.py script, select your model under model_name variable
- Launch
python3 prepare_eval.py
- A result is generated in the folder data/val/results
- Go to src/cocoapi/PythonAPI/demos and move evaluation.ipynb notebook to this directory
- Configure the variables annDir, annFile and resFile according to your machine and your model name
- lauch evaluation.ipynb notebook
To visualize the model on the validation set, launch visualize.py, you need to configure the following parameter:
- model_path: path to your trained model, if it does not exists, a pretrained COCO model is loaded (with poor results)
- Launch
python3 training.py
- Press q to go to the next image
.
├── data
│ ├── images
│ ├── info.docx
│ ├── separate_data.py
│ ├── test
│ │ ├── images
│ │ ├── result.csv
│ │ └── test.json
│ ├── train
│ │ ├── images
│ │ ├── train_info.json
│ │ ├── train.json
│ │ └── train_results.json
│ ├── unlabeled
│ │ ├── convert_unlabeled.py
│ │ ├── images
│ │ ├── train_info_ext.json
│ │ └── val_info_ext.json
│ └── val
│ ├── images
│ ├── results
│ ├── val_info.json
│ └── val.json
├── environment.yml
├── model
│ ├── fasterrcnn_resnet50_fpn_bb5_p3.pt
├── README.md
├── runs
└── src
├── cocoapi
├── ColruytDataset.py
├── export_test.py
├── prepare_eval.py
├── training.py
├── utils.py
└── visualize.py