L&T EduTech Hackathon Solution - Yash Khandelwal

This repository is my solution for Problem Statement 3. IMG

Background

Natural disasters and atmospheric anomalies demand remote monitoring and maintenance of naval objects especially big-size ships. For example, under poor weather conditions, prior knowledge about the ship model 🚢 and type helps the automatic docking system process to be smooth. Hence, a ship or vessel detection system can be devloped and used in a wide range of applications 📊, in the areas of maritime safety for disaster prevention, fisheries management, marine pollution, defense and maritime security, protection from piracy, illegal migration, etc. In this repository, I showcase the modelling of a Deep Learning model that can be deployed and used in an automated system to identify ship type only from the images taken by the survey boats ✔️

Data Exploration

Data Source

There are 6,252 images in the training data & 2,680 images in test data. The images belong to 5 classes, namely:

  • Cargo
  • Military
  • Carrier
  • Cruise
  • Tankers

Random Sample of images from each class:

IMG IMG

Image count of each classes in the Training Data & after train-validation split:

IMG

  • The dataset has a decent balance of class images in the training data which is preserved even after a train-validation split (random_state = 42).

Modelling

Used the technique of Transfer Learning to devlope a Deep Learning model for the given task.

I have used two appraoches for devlopeing models and obtaining the most optimal based on the Cohen Kappa metric: IMG

In the approach 1, the head has a Flatten layer followed by a hidden layer with 512 neurons & then a predicition layer with 5 neurons (5 classes).

In the appraoch 2, I have used GlobalPooling instead of Flatten + Hidden layers in the head. Here are a few reasons:

  • Global Pooling condenses all of the feature maps into a single one, pooling all of the relevant information into a single map that can be easily understood by a single dense classification layer instead of multiple layers.
  • It's typically applied as average pooling (GlobalAveragePooling2D) or max pooling (GlobalMaxPooling2D) and can work for 1D and 3D input as well.
  • Note that bottleneck layers for networks like ResNets count in tens of thousands of features, not a mere 1536. When flattening, you're torturing your network to learn from oddly-shaped vectors in a very inefficient manner.

Pre-trained Models used in modelling:

  • ResNet50
  • VGG16
  • XCeption

Above models evaluation on ImageNet data:

Model Size (MB) Top-1 Accuracy Top-5 Accuracy Parameters Depth Time (ms) per inference step (CPU) Time (ms) per inference step (GPU)
Xception 88 79.0% 94.5% 22.9M 81 109.4
VGG16 528 71.3% 90.1% 138.4M 16 69.5
VGG19 549 71.3% 90.0% 143.7M 19 84.8
ResNet50 98 74.9% 92.1% 25.6M 107 58.2

I went on with the following models due to their State-of-the-art performance in image classification.

Experimental Results on Ship DataSet.

Approach 1

I began with ResNet50 to obtain a baseline model.

  • Augumentations used = rotation, horizontal flip, width shift, height shift + pixel values scaled (0-1)
  • Batch size = 64
  • Learning rate = 1e^-4
  • Optimizer = RMSProp
  • Loss = Categorical Cross Entropy
  • Metric = Cohen Kappa
  • Epochs = 200

Results: IMG

  • After 175th epoch, the model gradually starts to overfit. By 200th epoch, it achieves, loss = 0.8804, cohen_kappa = 0.5358, val_loss = 0.9585, val_cohen_kappa = 0.5346. Total training time for 200 epochs = 5hrs(approx).
  • This approach gives suboptimal results. Even, increasing the number of epochs, or playing around with other hyperparamters and using different pre-trained model doesn't seem to work well as this is time consuming & model isn't genralizing well on the validaiton data. Use of flatten layer & hidden layer in the head part increases a lot of trainable parameters = 51,383,301 making the model more complex & prone to overfitting.

Approach 2

IMG

Training for 30 epochs. Metric Evaluation on Ship Dataset (following are the best results obtained in 30 epoch training)

Model Training Loss Training Kappa Training Accuracy Validation Loss Validation Kappa Validation Accuracy
Xception 0.0746 0.9608 0.9697 0.3965 0.8963 0.9200
VGG16 0.3257 0.8528 0.8867 0.3632 0.8375 0.8731
ResNet50 0.1016 0.9532 0.9639 0.6208 0.8704 0.8998

Obtained the highest Kappa Score on validation data with Xception. Moreover, all other metrics and loss values are most optimal in the case of Xception. Hence, I chose Xception as the baseline.

After multiple experiments, with the following set of hyperparamters on Xception I obtained the final model:

  • Augumentations used = rotation, horizontal flip, width shift, height shift, zoom, vertical flip + pixel values scaled (0-1)
  • Batch size = 64
  • Learning rate = 3e^-4
  • Optimizer = Adam
  • Loss = Categorical Cross Entropy
  • Metric = Cohen Kappa, F1 Score, Categorical Accuracy
  • Epochs = 50

Results (Primary Metric = Cohen's kappa; Train: 85%, Validation: 15% of Training dataset):

Model Training Loss Training Kappa Training F1 Training Accuracy Validation Loss Validation Kappa Validation F1 Validation Accuracy
Xception 0.0956 0.9559 0.9674 0.9659 0.2920 0.9081 0.9357 0.9286

Inference

The prediction submission file is present in PS3 Deep Learning Solution directory along with training and inference python notebook.

Final Model Link

Download the final trained model from here.

Conclusion

Xception gives the best predictive performance based on Cohen's kappa metric. Hence, the trained model can be deployed in automated systems for the purpose of ship type identification. Scope of improvement: 1. Using an ensemble of multiple deep learning models. 2. Use of diffferent loss functions such as focal loss that would focus learning on hard misclassified examples. 3. Generation of synthetic images of classes with less image count for training.