Pollen-Data-Classification-using-Vision-Transformer

This work is done for fulfillment of project of Course Machine Vision and Image Understanding. Final report of the completed work can be found in this link.

In this project, we have decided to explore the current state of the art vision transformer for testing pollen grain image classification problem from ICPR 2020. We have decided to work on this topic because we believe this work will be noble for this challenge and will also help us get a deeper understanding of the multi head self attention mechanism for vision tasks. The CNN only classifiers do not take into account the global information of the image however the transformer takes the global information about the image into account which helps to make the decision making of the system more robust.

The folders are arranged on following manner:

Each ViT16 and ViT32 folders contains the following folders which are designed in following manner:

Version 1 --> Only the classification head is trained --> 4K Parameters
Version 2 --> Classification head and 11th block are trained --> 7M Parameters
Version 3 --> Classification head, (10, 11) blocks are trained --> 14M Parameters
Version 4 --> Classification head, (9, 10, 11) blocks are trained --> 21M Parameters
Version 5 --> Classification head, (8, 9, 10, 11) blocks are trained --> 28M Parameters
Version 6 --> Classification head, (7, 8, 9, 10, 11) blocks are trained --> 35M Parameters

Several experiments were performed, providing below results. Performance of different models trained for 10 epochs, and the performance plateau were obtained for all the models. The best performance of the different models are summarized below. Score here reported are only accurac. Other F1-weighted and F1-Macro scores can be found in the notebooks.

Model	Model Part trained	Validation Score
====================	==========Adam Optimizer=========	==================
ViT16	Head only	92.4%
ViT16	Head + 11th block	94.2%
ViT16	Head + (11,10) block	94.5%
ViT16	Head + (11,10,9) block	94.5%
ViT16	Head + (11,10,9,8) block	94.6%
ViT16	Head + (11,10,9,8,7) block	94.9%
--------------------	---------------------------------	------------------
ViT32	Head only	92.3%
ViT32	Head + 11th block	94.5%
ViT32	Head + (11,10) block	94.1%
ViT32	Head + (11,10,9) block	94.5%
ViT32	Head + (11,10,9,8) block	94.9%
ViT32	Head + (11,10,9,8,7) block	94.5%
====================	=======AdaBelief Optimizer=======	==================
ViT16	Head only	92.7%
ViT16	Head + 11th block	94.5%
ViT16	Head + (11,10) block	95.0%
ViT16	Head + (11,10,9) block	95.1%
ViT16	Head + (11,10,9,8) block	94.7%
ViT16	Head + (11,10,9,8,7) block	95.2%
--------------------	---------------------------------	------------------
ViT32	Head only	91.9%
ViT32	Head + 11th block	94.5%
ViT32	Head + (11,10) block	94.6%
ViT32	Head + (11,10,9) block	95.2%
ViT32	Head + (11,10,9,8) block	95.1%
ViT32	Head + (11,10,9,8,7) block	95.6%
ViT32	Head + (11,10,9,8,7,6) block	94.6%

Best trained model and latest checkpoints can be found in this link

Best Ensembled Model Performance on Validation and Test Set

More detail on model selected for ensembling and testing is in this notebook

Models selected :

Model 1 - ViT16, Version 4
Model 2 - ViT16, Version 6
Model 3 - ViT32, Version 5
Model 4 - ViT32, Version 6

Only the best model combination and the correponding test results are reported

Model Combination	Validation Score	Test Score
1, 4	96.3652%	94.5922%
1, 3, 4	96.4539%	94.8582%
1, 2, 3, 4	96.4539%	94.9468%

ghimireadarsh/Pollen-Data-Classification-using-Vision-Transformer

Pollen-Data-Classification-using-Vision-Transformer

Best Ensembled Model Performance on Validation and Test Set