- Collect data: make photos of multiple parts in each image
- Prepare dataset: segment out individual parts and sort the results
- Finalize dataset: create train/validation split
- Train a model: finetune existing model trained on ImageNet
- Test model
We are using Python 3 to be able to run Tensorflow on Windows
dataset root/
- raw - raw photos from camera (user input)
- downsampled - raw photos downsampled to 1024 px on longer side (generated from raw photos)
- masks - masks generated from downsamled raw images (these are 4-value masks for GrabCut algorithm). You can edit masks in graphical editor to improve segmentation.
- segmentation - segmentation results of GrabCut algorithm - all background pixels are zeroed.
- parts - contains extracted individual parts for each image
- sorted - user sorted parts. Each subdirectory contains parts belonging to the same class. Name of subdirectory is regarded as class label name
- test - raw photos to use as a test set. File names are regarded as true labels.
python3 PrepareDataset.py data_root_dir
Runs segmentation step. The procedure is as follows:
For each raw image in 'raw' directory
- create downsampled image if not exists already
- create foreground segmentation 4-value mask for GrabCut if not exists already in 'masks' dir
- run GrabCut to segment out background (if segmentation result does not exist already)
- extract individual parts from segmentation results (always overwrites results)
python3 FinalizeDataset.py data_root_dir
Creates train/validation split using 'sorted' directory.
Results - train.txt and val.txt - are created in data root dir.
python3 Finetune.py --model A --snapshot weights-784-0.969.hdf5
Runs training.
--model specifies model name to use.
--snapshot file [optional] specifies a snapshot file to restart training from.
--debug_epochs N [optional] specifies number of training epochs to save images fed to network. These images will be written in data_root_dir/debug directory.
python3 Predict.py new_test\sorted measure --tta 2 --rtta 1 --model A --snapshot weights-784-0.969.hdf5
Runs prediction on test set (from 'new_test\sorted' subdirectory).
There are two modes: 'measure' and 'sort'
-
measure will calculate top1 and top5 accuracies given images with known labels, stored to directories (one directory - one class). Classification results and final accuracies are written to console.
-
sort will sort images with unknown labels into directories - dir name will correspond to predicted class label.
--model and --snapshot arguments are the same as for 'Finetune.py' script.
--tta 0 means no test time data augmentation, 1 - do vertical flip, 2 - do vertical and horizontal flips.
--rtta <1 means no robot test time data augmentation, 2 and more means take that many images from sorted directory and average results.
--tta_mode 'mean' or 'majority' aggregation method for TTA. Default is 'mean'.
python3 data_root_dir camera_index --tta 3 --rtta 1 --model E --snapshot weights-745-0.875.hdf5
Runs predictions from web camera.
data_root_dir - data root dir. Needed to find models snapshots (which are in subdirectory 'snapshots')
camera_index - index of web camera in the system. Default is 0
--tta - test time augmentation level (see Predict.py arguments description)
--rtta - robot test time augmentation level (see Predict.py arguments description)
--model - model name to use
--snapshot - snapshot file to load. Snapshots should be stored in data_root\snapshots\model_name\
These results were obtained using command line params as follows:
--tta 3 --rtta 3 --tta_mode mean
Top 1 and top 5 accuracies were measured on a test set of 386 files
Best snapshot: model C, weights-860-0.917.hdf5
top1: 91.4%
top5: 95.7
predict time per image: 900 ms
Best snapshot: model E, weights-745-0.875.hdf5
top1: 80.7%
top5: 94.6%
predict time per image: 347 ms
Best snapshot: model F, weights-557-0.910.hdf5
top1: 76.3%
top5: 94.6%
predict time per image: 514 ms