At first glance, this challenge looked like a binary image segmentation task. So at the beginning, I experimented with many model architectures and hyperparameters. Ran hundreds of experiments, thanks for 4x1080Ti rig. Final solution included three U-Net like models with a few bells & whistles. They were ensembled with models of my teammate for final submission.
- Be patient. Let it train for night, instead of stopping training after 10 epochs if it seems not converging. Don't meditate on those curves. Have a walk!
- Use heavier encoders. Top-performers reported on using Senet154 as encoder. Funny, I had this encoder, but have not even tried it.
- Don't rely single-fold validation results. Always do cross-validation.
- Keep in mind number of submits. In a late game I was unable to merge with another team due to exceeding number of submits.
- Understand where and why your model fails the most. In this challenge, it was a key to understand and use the fact that there were no solid masks in trainset.
- Don't be too lazy. 'Assemble mosaic' was in my roadmap since the first week of competition. Shame for me, I didn't use it at all.
Dual-path encoder with U-Net decoder implementation borrowed from https://github.com/selimsef/dsb2018_topcoders/tree/master/albu/src.
- WiderResNet38 encoder
- U-Net like decoder (Double [conv3x3 + bn + relu])
- OCNet in the central bottleneck. OCNet dilation factors were [2,4,6]
- SCSE blocks in decoder
- Input tensor was 3-channel image [I, Y, I * Y]
- In addition to mask output, model was also predicting salt presence
- There was additional loss for regularization attached to conv3 output of the encoder
- WiderResNet38 encoder
- U-Net like decoder (Double [conv3x3 + bn + relu])
- SCSE blocks in decoder
- Input tensor was 3-channel image [I, Y, I * Y]
- In addition to mask output, model was also predicting salt presence
Trainset was split into 5 folds based on salt area. I used image size of 128x128 (resized with Lancsoz). I experimented with padding, but did not notice any improvement. Also tried 224 patches with padding, but I didn't run full validation on all folds and abandoned it.
There were a 3 losses:
- Mask loss (BCE/Jaccard/Lovasz) with weight 1.0
- Classification loss (BCE) with weight 0.5
- Auxilarity loss (BCE) with weight 0.1
Training was done in 3 stages:
- Warmup train for 50 epochs with BCE loss for mask with Adam.
- Main train for 250 epochs with BCE+Jaccard loss for mask with Adam.
- Fine-tune with 5 cycles of cosine annealing and restart.
For prediction, I used horisontal flip TTA.
Predictions after 5 annealing cycles and main train phase were averaged. In total, there were 60 predictions per model (6 weights x 5 folds x 2 TTA).
Masks were conditionaly zeroed if classifier predicted empty mask. Non-zero masks were postprocessed with binary_fill_holes
method.
- CRF postprocessing
- Geometric / Harmonic mean at ensembling
- Mixup augmentation
- Xception encoder
- DeepLab-like models
- Resnet34 :)
- Threshold tuning
- Regularization with auxilarity loss
- Predicting salt/not-salt
- Stochastic weight averaging
- GANs for data augmentation
Repository will be not maintained. You can use it at own risk.