UBCO-Competition: A Python repository from brendanartley

20th Place Solution

Our solution for the UBC-Ocean Competition is based on a multiple instance learning (MIL) architecture with attention pooling. We use an ensemble of efficientnet_b2, tf_efficientnetv2_b2.in1k and regnety_016.tv2_in1k backbones trained on sequences of 8 x 1280 x 1280 images, and ignore the other class. We also apply light TTA during inference (rot90, flips, transpose, random image order).

Strategies

Efficient Tiling

We select tiles from WSIs based on the darkest median pixel value. To make the pipeline more efficient, we use multiprocessing on 3 CPU cores, and prefilter crop locations using the smaller thumbnail images. This prefiltering selects the largest area of tissue on the slide and ignores other smaller areas of tissue.

For TMAs, we take 5 central crops of size 2560 x 2560 and resize to 1280 x 1280 to match WSI magnification.

Although efficient, a limitation of the pipeline is that it may not extract informative tiles from each image. We experimented with a lightweight tile classifier trained on the ~150 segmenetation masks, but this did not improve tile selection.

Modeling

We trained each model for 20-30 epochs with heavy augmentations and SWA (Stochastic Weight Averaging). Most models were trained on all the WSIs and TMAs, but some were trained using synthetically generated TMAs (aka. TMA Planets) from the supplemental masks. We would likely have explored TMA planets further but were skeptical of the mask quality, and low count relative to the total number of WSIs.

OOF Relabel + Remove

Based on Noli Alonso's comments, we removed ~5% of the images and relabelled 8 images. We used a similar denoising method to that in the 1st place solution of the PANDA Competition.

relabel_dict = {
    '15583': 'MC',
    '51215': 'LGSC', 
    '21432': 'CC',
    '50878': 'LGSC',
    '19569': 'MC',
    '38097': 'EC',
    '29084': 'CC',
    '63836': 'LGSC',
}

External Data

The only external dataset we used was the Ovarian Carcinoma Histopathology Dataset (SFU). This dataset had 80 WSIs at 40x magnification from 6 different pathology centers.

Class distribution: {'HGSC': 30, 'CC': 20, 'EC': 11, 'MC': 10, 'LGSC': 9}

Did not work for us

Larger backbones
Lightweight tile classifier
Stain normalization (staintools, stainnet, etc.)
JPGs

Frameworks

Pytorch Lightning (training)
Weights + Biases (logging)
Timm (backbones)

Notes for future competitions (not part of write-up).

Improvements

Should have looked at using image previews for external data! There were many external TMA datasets that could have been used by cropping from these previews. See 12th place solution. One really good source was tissuearray[.]com, which I found but clicked away before looking at the previews!
Should have looked at backbones pretrained on histopathological images. This would have given stronger starting weights. See 7th and 6th solution.
Should have explored Other class more. Could have created Other class from the stroma/necrosis areas of the segmentation masks. This was used in the 10th and 8th place solutions.
Seems alot of top solutions used more tiles on WSIs, or used multiple magnifications. See 15th place solution and 7th Place solution.
Should have done model error analysis/denoising earlier. Only explored mislabelled/noisy images in the last couple weeks of the competition, and made large gains by simply dropping ~5% and relabelling 8 images. Did not explore other dropping methods.
Should have trusted frozenStainNet attached to MIL model! This seemed to help generalize to unseen staining methods.

Positives