/lego_detector

Lego parts classifier using deep neural networks

Primary LanguagePython

Lego parts classifier

Workflow

  • Collect data: make photos of multiple parts in each image
  • Prepare dataset: segment out individual parts and sort the results
  • Finalize dataset: create train/validation split
  • Train a model: finetune existing model trained on ImageNet
  • Test model

Data Repository

Google Drive Folder

Python

We are using Python 3 to be able to run Tensorflow on Windows

Dataset directory structure

dataset root/

  • raw - raw photos from camera (user input)
  • downsampled - raw photos downsampled to 1024 px on longer side (generated from raw photos)
  • masks - masks generated from downsamled raw images (these are 4-value masks for GrabCut algorithm). You can edit masks in graphical editor to improve segmentation.
  • segmentation - segmentation results of GrabCut algorithm - all background pixels are zeroed.
  • parts - contains extracted individual parts for each image
  • sorted - user sorted parts. Each subdirectory contains parts belonging to the same class. Name of subdirectory is regarded as class label name
  • test - raw photos to use as a test set. File names are regarded as true labels.

Workflow commandlines

python3 PrepareDataset.py data_root_dir
Runs segmentation step. The procedure is as follows:
For each raw image in 'raw' directory

  • create downsampled image if not exists already
  • create foreground segmentation 4-value mask for GrabCut if not exists already in 'masks' dir
  • run GrabCut to segment out background (if segmentation result does not exist already)
  • extract individual parts from segmentation results (always overwrites results)


python3 FinalizeDataset.py data_root_dir
Creates train/validation split using 'sorted' directory. Results - train.txt and val.txt - are created in data root dir.


python3 Finetune.py --model A --snapshot weights-784-0.969.hdf5
Runs training.
--model specifies model name to use.
--snapshot file [optional] specifies a snapshot file to restart training from.
--debug_epochs N [optional] specifies number of training epochs to save images fed to network. These images will be written in data_root_dir/debug directory.


python3 Predict.py new_test\sorted measure --tta 2 --rtta 1 --model A --snapshot weights-784-0.969.hdf5
Runs prediction on test set (from 'new_test\sorted' subdirectory).
There are two modes: 'measure' and 'sort'

  • measure will calculate top1 and top5 accuracies given images with known labels, stored to directories (one directory - one class). Classification results and final accuracies are written to console.

  • sort will sort images with unknown labels into directories - dir name will correspond to predicted class label.

--model and --snapshot arguments are the same as for 'Finetune.py' script.
--tta 0 means no test time data augmentation, 1 - do vertical flip, 2 - do vertical and horizontal flips.
--rtta <1 means no robot test time data augmentation, 2 and more means take that many images from sorted directory and average results. --tta_mode 'mean' or 'majority' aggregation method for TTA. Default is 'mean'.


python3 data_root_dir camera_index --tta 3 --rtta 1 --model E --snapshot weights-745-0.875.hdf5
Runs predictions from web camera.
data_root_dir - data root dir. Needed to find models snapshots (which are in subdirectory 'snapshots')
camera_index - index of web camera in the system. Default is 0
--tta - test time augmentation level (see Predict.py arguments description)
--rtta - robot test time augmentation level (see Predict.py arguments description)
--model - model name to use
--snapshot - snapshot file to load. Snapshots should be stored in data_root\snapshots\model_name\

Results

These results were obtained using command line params as follows:
--tta 3 --rtta 3 --tta_mode mean
Top 1 and top 5 accuracies were measured on a test set of 386 files

ResNet50 (24M params)

Best snapshot: model C, weights-860-0.917.hdf5
top1: 91.4%
top5: 95.7
predict time per image: 900 ms

MobileNet (5M params)

Best snapshot: model E, weights-745-0.875.hdf5
top1: 80.7%
top5: 94.6%
predict time per image: 347 ms

InceptionV3 (24M params)

Best snapshot: model F, weights-557-0.910.hdf5
top1: 76.3%
top5: 94.6%
predict time per image: 514 ms