Implementation vom Flower Projekt für die Data Science 1 Veranstaltung.
This code is traning two models to classify an image into one of the following groups:
- daisy
- dandelion
- rose
- sunflower
- tulip
To run this code you need to download the images from the following Datasets:
- https://www.kaggle.com/mgornergoogle/five-flowers
- https://www.kaggle.com/ianmoone0617/flower-goggle-tpu-classification
and put them into the folder data/data1
(five-flowers) and data/data2
.
Important: The folder structure must appear as follows:
data/
[data1, data2]flowers/
Otherwise you will not be able to re-run the code.
It is recommended to run the code in a virtual environment (e.g. virtualenv
). requirements.txt
contains all the needed Python libraries.
Alternatively, it is already done in the docker container ody55eus/flowers
.
The following preprocessing steps are performed by data_preparation.py
:
- Copy both datasets together .
- Delete all pictures that are not fitting into one of the given 5 groups.
- Identify and delete duplicate images.
- Split the dataset into train (80%) and test (20%) data.
The convolutional neural network (CNN) training can be reconstructed with the file cnn_training.py
Before training the CNN the training data are split again into train (80%) and validation (20%) data (this is performed by cnn_training.py
or cnn_split_test_val.py
).
If you choose a higher number of epochs
, you can probably improve the result.
The training was processed with Google Colab to take advantage of the fast computation with GPU.
The Evaluation of the CNN models can be reproduced by running cnn_evaluation.py
.
The support vector machines training and evaluation can be reproduced by svm.ipynb
The report and presentation can be found on GitLab.
The code is released under the MIT license.