This repository contains code for the 54th place solution in the Google - Isolated Sign Language Recognition competition on Kaggle.
The final model was an ensemble of 2 transformers trained on different seeds.
Stochastic Weight Averaging with cyclical learning rate for more generalizable models (ex. apply after 10 epochs).
- Averaging Weights Leads to Wider Optima and Better Generalization paper
- Good article by Max Pechyonkin explaining this idea here
- Tensorflow Implementation
- Solution that used this idea: Ohkawa3's 4th Place
Spend time Ensembling NNs on local CV
- Only did this on the last day of the competition
- If you can decrease the size of the NN without degrading CV then it seems like this is an easy way to boost performance
DepthwiseConv1D, DepthwiseConv2D seemed to be a good learnable layer that other competitors used to "smooth" the data
- Tensorflow DepthwiseConv1D
- Used in many top solutions: Forrato's 2nd place, Ruslan Grimov's 3rd place
Some good points from Chris Deotte's 44th place discussion Test different batch sizes
- "Many Kagglers overlook the fact that changing batch size can make a big difference for models. We should always try 0.25x, 0.5x, 2x, 4x batch size and change the learning rate for those experiments to be 0.25x, 0.5x, 2x, 4x respectively."
- "NN always benefits from more data and data augmentation."
- "The easiest way to boost CV LB for NN is to ensemble NN"