The problem is divided in two tasks:
-
Task 1: detection of the 2 most likely clothes categories out of 5 possibles. This is actually a typical multi-class task (there is actually only one real category for a given item picture). For any given item picture, the objective here is to get the probabilities of each of the 5 possible categories (blouses, casual_dresses, mini_dresses, shirts, tank_tops), and select the 2 highest ones.
-
Task 2: detection of 3 potential tags. Here different tags might appear together; this is a typical multi-label task.
The strategy selected is a multi-task approach (here 2-tasks): one single model performs both kinds of classification. In this model:
-
The task 1 is a multi-class classification of the cc3 classes. The output for this task (see function transfer_learning_model) is a dense(5) layer, to be matched with a one-hot-encoded version of the cc3 columns (containing 5 classes). As it is a multi-class problem, the loss function for that task is a softmax, i.e., the probabilities of all classes sum to 1.
-
The task 2 is a multi-label classification of the polka dot, floral, checker columns. The output for this task (see function transfer_learning_model) is a dense(3) layer, to be matched with these 3 columns (already in one-hot-encoded format). As it is a multi-label problem, the loss function for that task is a sigmoid, i.e., the probabilities for each label is independent from the other labels.
The model makes use of a transfer-learning model. It is composed of a convolutional base (VGG16 net), and a dense output for each task. See the model.png file for details.
It is using the tf.keras api, as it has become the standard go-to from TF 2.0.
Directory structure: (new since 2020-01-07: model 2 available)
.
├── data
└── model_training
├── tf_keras_model
├── tf_keras_model_1
├── tf_keras_model_2
└── saved_model
└── my_model
├── 1
│ ├── assets
│ └── variables
└── 2
├── assets
└── variables
Original files (detailed in homework_dl.md
)
-
data.zip
file containing image files that you will use for prediction (not in git repo as heavy. available here: https://vinted-ml-homework.s3.eu-central-1.amazonaws.com/OCJ4HAtw0xW/v3.zip) -
data.parquet
: a label file -
test.parquet
: a file for making predictions -
example_predictions.parquet
: a file with example predictions
Solution files:
-
Vinted_exercise.ipynb
: the main notebook. -
predictions.parquet
: the file containing the predictions of model 1, for thetest.parquet
file. -
predictions_model2.parquet
: the file containing the predictions of model 2 (new since 2020-01-07: model 2 available), for thetest.parquet
file. -
module.py
- a module containing helper functions -
directory
./model_training
: it contains the model run logs. -
directory
./model_training/tf_keras_model_1
: it contains the model run with model architecture descriptions: -
./model_training/tf_keras_model_1/model-epoch09-cc3_loss1.2428-tags_loss0.2830-val_cc3_acc0.52-val_tags_acc0.88.h5
: the tf.keras model containing both weights and model architecture. -
./model_training/tf_keras_model_1/model.png
: the description of the model layers -
./model_training/tf_keras_model_1/model_summary.txt
: complementary description of the model layers, wit total number of parameters in the model -
./model_training/tf_keras_model_1/Loss_Accuracy_plots.png
: plot of the loss and accuracy depending on epochs -
directory
./model_training/saved_model
: the tf.keras model is converted to tensorflow .pb file format. This is used for the tensorflow-serving application.
new since 2020-01-07: model 2 available:
-
directory
./model_training/tf_keras_model_2/
containing model 2: -
./model_training/tf_keras_model_2/model-epoch32-cc3_loss1.1084-tags_loss0.3010-val_cc3_acc0.59-val_tags_acc0.89.h5
: the tf.keras model containing both weights and model architecture.
The notebook is divided in 3 parts:
- Data loading, exploration and preparation
- Multi-task Modelling
- Serving the model using TF-SERVING
In part 1, the datasets are
- loaded,
- prepared (label encoding and categorical encoding), uuid column completed with ".jpg" format
- analysed: simple EDA shows data imbalance. Correlation matrix shows the potential of using a multi-task model since some correlations exists between cc3 and tags features (example: shirt-checker correlation)
- downsampled (if that option is turned on, like in model 1. In model 2 it is turned off)
In part 2, the multi-task model is
- declared (with all model hyperparameters, as well as options like data augmentation)
- trained. Note that 4 datasets generators are created:
- the train dataset
- the validation dataset
- the heldout dataset: this is a dataset from the same dataframe as train and validation, that is held out in order to build performance metrics
- the test dataset, on which the model is applied. The results will be in predictions_model2.parquet (for model 1) or predictions_model2.parquet (for model 2) Note that the results of the model (plots, checkpoints, ...) will be saved in the (empty) tf_keras_model directory. When a model is satisfactory, please rename the folder to tf_keras_model_N (N being index of your model) and re-create a tf_keras_model folder for potential next run.
For the 3rd part, the serving part, it should be activated by activating first the cells of part 1 beforehand. No need to activate cells of part 2 (no need to train a new model, just load the latest one saved).
Part 3 saves the keras model in the saved_model folder. Then (after some inspection of the model), the model can be served using TF-serving. The principle is to launch a TF-serving server that loads the model and listens to requests. Do this by the bash command (open terminal in main directory for this):
tensorflow_model_server --model_base_path=$(pwd)/model_training/saved_model/my_model/ --rest_api_port=9000 --model_name=my_model
Note that the model that will be loaded will be the latest version, hence here model 2
-
Model 1 has a VGG16 convolutional base, while model 2 has VGG19
-
While model 1 has input image height x width: 301x217, model 2 has squared input image 224 x 224 (to follow standard practice)
-
Model 1 starts with a default learning of 1e-3 for the Adam optimizer, and has been ran for 15 epochs. Model 2 starts with a lower learning rate of 1e-4 and has been ran for 40 epochs.
-
Model 1 uses data downsampling: by downsampling the "blouse" class of the cc3 feature so that all classes have same sample number. Using performance metrics, it can be noticed that the model 1 is performing too poorly for that class (and much better on the other ones). As shown by confusion matrix analysis, the model 2 performs more evenly on all classes. That is the main argument in favour of the second model. (Note: both float precision and downsampling options are configurable in the cell "Main parameters of project" in part 1.)
-
Added confusion matrix for cc3 classes
-
Added ROC and Precision-Recall curves for tags binary labels, as well as confusion matrices for each tag. Probability threshold adapted to 0.4, as it was experienced to be more adapted than 0.5 (more balanced).
- Inplemented mixed precision training, i.e. the training is using Float16 precision in place of float32 when possible (especially meaningful during convolution operations). If trained on RTX graphic cards (possessing Tensor Cores) the model could potentially be accelerated twice. In any case, the model is twice lighter compared to case when it has been ran using float32 precision. Note that the input and output layers of the network should be kept in dtype=tf.float32 in TF2.0. From TF2.1 that won't be a requirement anymore.
The model was run on an Ubuntu 18.04 machine, using a GTX 1060 GPU. One epoch lasts ~ 90sec.
The python environment with the version of the packages is detailed in the conda_list.txt
file.
-
Unit testing
-
Use MLflow for consistent model tracking
-
Use logging
-
Use Tensorboard for performance tracking during a given run
-
Hyperparameters tuning using cross-validation
-
Implement fine-tuning by unfreezing the last convolutional block of the VGG16
-
Try different architectures (tried a few ones, but many more to explore)