Since I wanted to try the ArcFace loss function, I managed to test it along with CoAtNet-3 model and GeM pooling layer, and the model achieves cross-entropy loss value of 0.0049 for about two folds cross-validation with 100% accuracy on each fold.
ArcFace: https://arxiv.org/abs/1801.07698 CoAtNet: https://arxiv.org/abs/2106.04803 GeM pooling: https://arxiv.org/abs/1711.02512