Wrapper for image augmentation for Deep Learning image classification tasks.
The main purpose of this wrapper is to create heavy augmented image data for Deep Learning training.
All images to be augmented should be placed
in the src/main/data/source
folder, and a minimum of 10 images is required.
Due to that, every single image will be created at least 2.000
augmented examples, all those files already split in the train
, test
,
and validation
folders. This will be discussed in the next section.
In almost 99% of all posts and articles about Data Augmentation, we do not see the Data Leakage problem being discussed and this is a big issue for Deep Learning models.
In this wrapper, there's a strict separation of the train
, test
,
and validation
folders to isolate those datasets to avoid Data Leakage that
will lead to overfitting.
In simple terms the mechanism works in the following way:
-
All images in the
reshaped
folder are included in an array and shuffled (using theseed=42
for reproducibility); -
There's a fixed proportion for each set generated. For training, test and validation sets the proportion is 80%, 10%, and 10%;
-
After this shuffle, all images receive the augmentation effect and are placed in their respective folder.
- Docker 19+
- Docker-Compose
All seeds has the value in 42
, even for the libraries imageio
and imgaug
.
As we're using batch_augmentation
module from imgaug
library, by default
all wrapper will run in multicore. Do not use python multiprocessing
module due
to the fact a child worker (i.e. different images) will be augmented more than
once accidentally.
-
Go to the folder
src/main/data/source
and delete all files contained -
Still in the folder
src/main/data/source
include all files that you want to be augmented. The minimum amount of files that needs to be there is10
$ make && docker build -t sirius_image_augmentation . && docker-compose up
- Error handling
- Logging
- "Tree-shaking" code
Because it's a docile, night-wise and beatiful cat from South America. No special reason. In doubt? See this video.