A collection of scripts to download data, train and evaluate an image classifier on Open Images using TensorFlow
- Create a list of all classes by image count
- Download images for custom lists of classes (using parallelization)
- Delete corrupt images
- Train a model of choice on the downloaded image dataset
- Evaluate the performance of the model (includes per-class accuracies)
Python 3.6 or higher
Package | Version |
---|---|
Pillow |
7.0.0 |
numpy |
1.18.5 |
requests |
2.22.0 |
tensorflow |
2.3.1 |
tensorflow-hub |
0.9.0 |
sklearn |
0.23.2 |
Other package versions may work too.
Can be installed from requirements.txt
-
Download the Image IDs, Image labels, Boxes and Class Names from https://storage.googleapis.com/openimages/web/download.html
(Train, Validation and Test of "Subset with Image-Level Labels" and Bounding Boxes of "Subset with Bounding Boxes") -
Create folders named out and processing
-
Run the script 2_create_class_list_by_image_count.py
Output:
-
Choose class names to train your classifier on from out/class_list_by_image_count and put them into a .txt file inside in/class_lists
Example:
-
Adjust all options in config.py under # image download to your liking
-
Run the script 4_delete_corrupt_images.py
-
Adjust all options in config.py under # model training to your liking
-
Run the script 5_train_model.py
Output:
Now you have an Tensorflow Image classifier at out/saved_model -
If you killed the previous script because it took too long, run 6_extract_model_from_checkpoint.py
-
DONE
- The dataset is very noisy, you might have to manually delete images that do not fit the label
- Make sure you have enabled GPU support https://www.tensorflow.org/install/gpu
- Place your dataset on a SSD drive (500Mb/s should be enough) for faster training