The repo contains all stuff to collect and process data for the project "This Football Jersey Doesn't Exist"
To start run:
pip install -r requirements.txt
To scrap images from https://www.footballkitarchive.com run:
python data_scraping.py download-images ~/all_jerseys_images_folder
all_jerseys_images_folder
-- the folder where the images will be stored. The structure of the folder
will be as follows:
all_jerseys_images_folder
|---- league_1
|---- team_1-season_1-kit_type.jpg
|---- team_2-season_1-kit_type.jpg
...
|---- league_2
...
Here team_1-season_1-kit_type.jpg
:
team_1
-- the name of the team (e.g.liverpool
)season_1
-- season (e.g.1990-91
)kit_type
-- the type of a kit (e.g.home
,away
etc.)
Unfortunately, some images have lousy quality, and some images have problems with the background removal step (see later). That's why you may need to filter dataset and separate these types of images.
To do it, run:
python data_scraping.py filter-images ~/all_jerseys_images_folder ~/jerseys_images_folder ./images_info/relevant_images.json
This command will save images from file img_info/relevant_images.json
(you can find this file in this repo) in ~/jerseys_images_folder
;
the rest images will stay at ~/all_jerseys_images_folder
.
https://www.footballkitarchive.com updates kits for new seasons;
you may want to pick some new jerseys with good quality that are not in img_info/relevant_images.json
.
To start training GAN, you need to prepare images for it: centre jerseys (jerseys are not centred by default) and remove background (a lot of images have different backgrounds; first experiments without background removing have shown bad quality results; after standardizing the background the quality of generated images has got much better).
To localize a jersey on an image, I've trained the detection model (weights you can find here) via super-gradients library and deepfashion2 dataset.
From the dataset, I've extracted images and bounding box annotations with classes short_sleeve_top
, long_sleeve_top
, short_sleeve_outwear
,
long_sleeve_outwear
, short_sleeve_dress
and long_sleeve_dress
; then I merged these classes in one class called
top
and fined tune pre-trained YOLO-NAS model for my task.
To predict bounding boxes for jerseys, run:
python jersey_preparation.py save-detection-results ~/yolo_nas_top_detection_model.pth ~/jerseys_images_folder ~/jerseys_images_folder/detection_results_dir
The command will predict bboxes and save to the folder jerseys_images_folder/detection_results_dir
three files:
one_box_predictions.json
(images with one bounding box), more_one_box_predictions.json
(images with more than one bounding box),
no_box_predictions.json
(images without any predictions).
You can additionally annotate images from files more_one_box_predictions.json
and no_box_predictions.json
.
Images in one_box_predictions.json
have one box, and you can use these detection results
(as is) to crop jerseys. But there can be issues with bbox localization on some images. If you want to use well-localised bboxes for filtered
images (see Images filtering part in Data Downloading chapter), you can use file images_info/jerseys_bounding_boxes.json
. I've manually fixed the bad
localised bboxes for images in the file.
As I mentioned above, the important step of data processing is background removal. Experiments with white background have
shown a much better quality of generated images. There are a lot of jerseys on the white background in the initial dataset.
The images for which the background needs to be removed, are stored in the file images_info/images_to_remove_background.json
.
To make the background on these images white, run:
python jersey_preparation.py remove-background ~/jerseys_images_folder ./images_info/images_to_remove_background.json
If you want to save images with initial background to the separate folder, run:
python jersey_preparation.py remove-background ~/jerseys_images_folder ./images_info/images_to_remove_background.json \\
--save_original --path_to_save_original ~/original_images_with_background_folder
To standardise images (crop to jerse's box, resize and pad to one shape), run:
python jersey_preparation.py prepare-images ~/jerseys_images_folder ./images_info/jerseys_bounding_boxes.json \\
~/prepared_images --size 512
Finally, you have the dataset ready for GAN training!