Health Data Hack

Introduction

This is our team’s 4th place solution for Healt Data Hack (second task). This contest was about segmentation of colorectal cancer cells on high resolution histological slices. Competition data was prepared by MIPT University, medtech.moscow and Phystech School of Biological and Medical Physics.

Solution

Splitting images

We did a cutter.py module that pads each side of the image to %patch_size length and then goes with sliding windows (window size equal to patch_size). In training we used two different patch sizes: 1024x1024 and 2048x204 with 50% overlap:

Tresholding

The dataset was segregated by thresholding it for a minimum X% of tissue pixels. After thresholding for several percentages such as 40%, 30%, 20%, etc., it was observed that by thresholding with 30%, maximum redundancy was removed, and useful information was saved.

TTA

For TTA we used simple averaging of the default image and augmentated:

Horizontally flipped
Vertically flipped
Rotated (90, 180, 270 angles)

Final ensemble

Our final solution contains two models:

Unet++ with EffNetb7 backbone and 2048x2048 patch size.
Unet++ with EffNetb7 backbone and 1024x1024 patch size.

We tried different types of Ensembles (MaxProb, MinProb, MeanProb) and Simple Averaging Ensemble obtain the best score.

Project structure

FirstLook.ipynb - exploratory data analysis notebook
Training.ipynb - notebook for training all models
Inference.ipynb - inference notebook
productions.py - preparing test data for prediction
train_functions.py - module for training and validation
modeling
- cutter.py - module for splitting images
- losses.py - custom loss functions
- metrics.py - custom metrics
- models.py - custom models
utils
- cfgtools.py - configuration file
- datagenerator.py - module for preparing data for training
- dataset.py - module for preparing data for training

Additional data: train data, test data, weights, configs, presentation, text

lnfin/HealtDataHack