/MIDL2020

Ovarian cancer classification based on histopathology images - Transfer learning strategy

Primary LanguagePython

MIDL2020

This is the implementation of the Classification of Epithelial Ovarian Carcinoma Whole-Slide Pathology Images Using Deep Transfer Learning. Citation information:

@misc{wang2020classification,
    title={Classification of Epithelial Ovarian Carcinoma Whole-Slide Pathology Images Using Deep Transfer Learning},
    author={Yiping Wang and David Farnell and Hossein Farahani and Mitchell Nursey and Basile Tessier-Cloutier and Steven J. M. Jones and David G. Huntsman and C. Blake Gilks and Ali Bashashati},
    year={2020},
    eprint={2005.10957},
    archivePrefix={arXiv},
    primaryClass={eess.IV}
}

Presentation and Slides

Reviews

Our work is inspired by ProGAN and fast.ai.

Prerequisites

  • Linux or macOS
  • Python 3.5.2
  • PyTorch 1.3.0+cu92
  • scikit-learn 0.22.1
  • NVIDIA GPU + CUDA CuDNN

Get Started

Installation

  • Install the required packages

    • pip install torch==1.3.0+cu92 torchvision==0.4.1+cu92 -f https://download.pytorch.org/whl/torch_stable.html
    • pip install -r requirements.txt
  • Clone this repo

mkdir wsi_classification
cd wsi_classification
git clone https://github.com/AIMLab-UBC/MIDL2020
cd MIDL2020

Repo Structure

  • patch_level.py is the patch-level classifier entry. Inside this file, function train initialize models, create data loader, optimize model weights, log training information, etc. evaluate simply loads the trained model and apply the model on validation or testing sets.

  • slide_level.py is the slide-level classifier entry. Inside this file, function main initialize models, create a data loader, optimize model weights, and compute the 6-fold cross-validation.

  • config.py is the program that reads arguments from the user. Therefore, you can custom any hyperparameters or settings through config.py

  • models is the sub-module that contains implementation involves models.

    • Inside models sub-module, models/base_model.py is the template class for models/models.py. It defines various expected behaviours of a model, such as forward, optimize_weights, load_state, save_state, etc. Any models should inherit BaseModel.

    • Inside models sub-module, models/models.py is the models/networks.py interface. It initializes models/networks.py and optimizes the weights of models/networks.py. Different models in models/models.py are summarized in the following table.

      Models Usage
      CountBasedFusionModel Slide-Level model wrapper. It assumes an input of N*C matrix, where N represents the number of slides, C represents the number of classes.
      DeepModel Patch-Level model wrapper. It is a simple wrapper for out-of-box deep learning models. It takes one patch for each forward pass. It follows the standard deep learning training protocol.
    • Inside models sub-module, models/networks.py is the implementation of various networks. It simply defines network architecture and forward functions. Keep it as simple as possible.

  • data is the sub-module that contains data loader, preprocess, and post-process functions.

    • Inside the data sub-module, data/base_dataset.py is the template class for other data loaders. It simply defines data loader behaviour, and assign config.py settings into the current data loader. Moreover, it also modifies the patch ids if requires, such as change multi-scale model scales, etc.

    • Inside the data sub-module, data/patch_dataset.py is the main patch images or patch features data loader.

      Dataset Usage
      SubtypePatchDataset It loads patch images from H5 files and applies preprocess steps.
    • Inside the data sub-module, data/create_patient_groups.py is the helper script to split the dataset by patients.

  • utils is the sub-module that contains simple but useful function snippet.

    • Inside utils sub-module, utils/utils.py contains a wide range of useful small functions. Contributors should also add useful functions in here, and add docstring. Moreover, functions in here should be self-explained and as general as possible.

    • Inside utils sub-module, utils/subtype_enum.py defines enum for classes.

Patch extraction

First of all, the enum in utils/subtype_enum.py should be defined.

Afterwards, we extract the 1024 * 1024 patches and then downsampled to 512 * 512 and 256 * 256 using the extract_patches.py script. This script not only extract patches but also store the patches into an H5 file for easy data transfer and management. However, we use our data annotation file so the annotation parse and check portion need to change for other datasets.

We store the patches in the h5 files who has the format subtype_name/slide_id/patch_locaton_x_y and use .txt files to store the data entry ids.

Patch-level: train, validation and test

The following bash script is used to invoke training, validation and test:

#!/bin/bash
chmod 775 ./patch_level.py
echo 'Two-stage model using split A'

echo 'Stage 1 - Patch Size 256 * 256 Training'
./patch_level.py  --deep_model DeepModel --deep_classifier two_stage --model_name_prefix split_a --use_pretrained --lr 0.0002 --batch_size 64 --epoch 20 --rep_intv 250 --use_equalized_batch --n_eval_samples 2000 --is_multiscale_expert --expert_magnification 256 --dataset_dir /projects/ovcare/classification/ywang/midl_dataset/1024_resize --preload_image_file_name 1024_resize.h5 --train_ids_file_name patch_ids/1_2_train_3_eval_train_ids.txt  --val_ids_file_name patch_ids/1_2_train_3_eval_eval_0_ids.txt --log_dir /projects/ovcare/classification/ywang/project_log/1024_resize_log/ --save_dir /projects/ovcare/classification/ywang/project_save/1024_resize_save/
echo 'Stage 1 - Patch Size 256 * 256 Validation'
./patch_level.py  --mode Validation --deep_model DeepModel --deep_classifier two_stage --model_name_prefix split_a --use_pretrained --lr 0.0002 --batch_size 64 --epoch 20 --rep_intv 250 --use_equalized_batch --n_eval_samples 2000 --is_multiscale_expert --expert_magnification 256 --dataset_dir /projects/ovcare/classification/ywang/midl_dataset/1024_resize --preload_image_file_name 1024_resize.h5 --train_ids_file_name patch_ids/1_2_train_3_eval_train_ids.txt  --val_ids_file_name patch_ids/1_2_train_3_eval_eval_0_ids.txt --log_dir /projects/ovcare/classification/ywang/project_log/1024_resize_log/ --save_dir /projects/ovcare/classification/ywang/project_save/1024_resize_save/

echo 'Stage 2 - Patch Size 512 * 512 Training'
./patch_level.py  --deep_model DeepModel --deep_classifier two_stage --model_name_prefix split_a --use_pretrained --lr 0.0002 --batch_size 32 --epoch 20 --rep_intv 250 --use_equalized_batch --n_eval_samples 2000 --is_multiscale_expert --expert_magnification 512 --dataset_dir /projects/ovcare/classification/ywang/midl_dataset/1024_resize --preload_image_file_name 1024_resize.h5 --train_ids_file_name patch_ids/1_2_train_3_eval_train_ids.txt  --val_ids_file_name patch_ids/1_2_train_3_eval_eval_0_ids.txt --log_dir /projects/ovcare/classification/ywang/project_log/1024_resize_log/ --save_dir /projects/ovcare/classification/ywang/project_save/1024_resize_save/
echo 'Stage 2 - Patch Size 512 * 512 Validation'
./patch_level.py  --mode Validation --deep_model DeepModel --deep_classifier two_stage --model_name_prefix split_a --use_pretrained --lr 0.0002 --batch_size 32 --epoch 20 --rep_intv 250 --use_equalized_batch --n_eval_samples 2000 --is_multiscale_expert --expert_magnification 512 --dataset_dir /projects/ovcare/classification/ywang/midl_dataset/1024_resize --preload_image_file_name 1024_resize.h5 --train_ids_file_name patch_ids/1_2_train_3_eval_train_ids.txt  --val_ids_file_name patch_ids/1_2_train_3_eval_eval_0_ids.txt --log_dir /projects/ovcare/classification/ywang/project_log/1024_resize_log/ --save_dir /projects/ovcare/classification/ywang/project_save/1024_resize_save/

Slide-level: train and test

We train Random Forests using 6-fold cross-validation on the results of six patch-level test set.

After changing the path to the six patch-level results in the slide_level.py, simply run python3 slide_level.py and it will output the slide-level results as well as save the trained model.

Our results

We include our patch-level and slide-level results in ./results/.

Datasets and Detailed Results

The epithelial ovarian carcinoma whole-slide pathology images used in this study will be available upon institutional review board approvals at a later date. We will update this page once the dataset is availablle.

Our dataset has the following distribution in terms of patients, slides, and 1024 * 1024 tumor patches:

Data Type CC LGSC EC MC HGSC Total
Patient 32 14 28 9 76 159
Slide 53 29 55 11 157 305
Patch 16.49% 16.31% 12.93% 10.96% 43.31% 161516

We first randomly divided the datasets by patients into three groups. We denote these three groups as Group 1, Group 2, and Group 3.

Group 1 has the following distributions:

Data Type CC LGSC EC MC HGSC Total
Patient 11 5 10 3 26 55
Slide 20 8 14 4 54 100
Patch 15.23% 10.43% 11.44% 13.93% 48.98% 56034

Group 2 has the following distributions:

Data Type CC LGSC EC MC HGSC Total
Patient 11 5 10 3 26 55
Slide 16 8 26 3 60 113
Patch 12.30% 17.80% 15.51% 7.09% 47.30% 64855

Group 3 has the following distributions:

Data Type CC LGSC EC MC HGSC Total
Patient 10 4 8 3 24 49
Slide 17 13 15 4 43 92
Patch 24.91% 22.05% 10.89% 13.03% 29.12% 40627

Slide-level and Patch-Level Results

For slide-level classification, we only use the patch-level test set results (as shown in the above figure) to build the input matrix to train random forests, and we report the 6-fold cross-validation slide-level results.

Split CC LGSC EC MC HGSC Weighted Accuracy Kappa AUC F1 Score Average Accuracy
baseline 6-fold cross-validation 83.02% 65.52% 54.55% 54.55% 80.25% 73.77% 0.5993 0.9391 0.6855 67.58%
stage-1 6-fold cross-validation 79.25% 79.31% 61.82% 54.55% 85.99% 78.69% 0.6730 0.9375 0.7414 72.18%
stage-2 6-fold cross-validation 86.79% 100.00% 74.55% 81.82% 90.45% 87.54% 0.8106 0.9641 0.8718 86.72%

For patch-level classification, we employ a 3-fold cross-validation scheme with a tweak. We use two of three patient groups as the training set and divide the remaining group equally by patient into two subgroups, one of the subgroups will be used as validation or test set. Therefore, we eventually have 6 different training, validation and test set.

Split A Distribution and Patch-level Classifier Test Results

Training set

Data Type CC LGSC EC MC HGSC Total
Patient 22 10 20 6 52 110
Slide 36 16 40 7 114 213
Patch 13.66% 14.39% 13.62% 10.26% 48.08% 120889

Validation set

Data Type CC LGSC EC MC HGSC Total
Patient 5 2 4 2 12 25
Slide 7 3 9 3 19 41
Patch 17.31% 27.79% 7.85% 31.73% 15.33% 14594

Test set

Data Type CC LGSC EC MC HGSC Total
Patient 5 2 4 1 12 24
Slide 10 10 6 1 24 51
Patch 29.18% 18.83% 12.59% 2.55% 36.85% 26033

Baseline, Stage-1 and Stage-2 test results

Model CC LGSC EC MC HGSC Weighted Accuracy Kappa AUC F1 Score Average Accuracy
Baseline 99.22% 89.60% 73.89% 100.00% 75.05% 85.33% 0.8015 0.9739 0.8546 87.55%
Stage-1 99.42% 79.05% 63.58% 99.85% 74.70% 81.97% 0.7543 0.9651 0.8105 83.32%
Stage-2 99.50% 77.34% 72.70% 99.55% 72.64% 82.05% 0.7568 0.9658 0.8243 84.34%

Split B Distribution and Patch-level Classifier Test Results

Training set

Data Type CC LGSC EC MC HGSC Total
Patient 22 10 20 6 52 110
Slide 36 16 40 7 114 213
Patch 13.66% 14.39% 13.62% 10.26% 48.08% 120889

Validation set

Data Type CC LGSC EC MC HGSC Total
Patient 5 2 4 1 12 24
Slide 10 10 6 1 24 51
Patch 29.18% 18.83% 12.59% 2.55% 36.85% 26033

Test set

Data Type CC LGSC EC MC HGSC Total
Patient 5 2 4 2 12 25
Slide 7 3 9 3 19 41
Patch 17.31% 27.79% 7.85% 31.73% 15.33% 14594

Baseline, Stage-1 and Stage-2 test results

Model CC LGSC EC MC HGSC Weighted Accuracy Kappa AUC F1 Score Average Accuracy
Baseline 89.90% 76.58% 81.05% 20.89% 58.78% 58.84% 0.4986 0.8969 0.5698 65.44%
Stage-1 89.59% 74.04% 75.72% 44.58% 50.38% 63.89% 0.5521 0.8997 0.6122 66.86%
Stage-2 86.26% 64.32% 63.23% 37.93% 60.84% 59.13% 0.4965 0.8823 0.5711 62.52%

Split C Distribution and Patch-level Classifier Test Results

Training set

Data Type CC LGSC EC MC HGSC Total
Patient 21 9 18 6 50 104
Slide 37 21 29 8 97 192
Patch 19.30% 15.31% 11.21% 13.55% 40.63% 96661

Validation set

Data Type CC LGSC EC MC HGSC Total
Patient 6 3 5 2 13 29
Slide 7 3 13 2 25 50
Patch 6.87% 12.82% 12.82% 12.84% 54.64% 18990

Test set

Data Type CC LGSC EC MC HGSC Total
Patient 5 2 5 1 13 26
Slide 9 5 13 1 35 63
Patch 14.55% 19.86% 16.62% 4.71% 44.26% 45865

Baseline, Stage-1 and Stage-2 test results

Model CC LGSC EC MC HGSC Weighted Accuracy Kappa AUC F1 Score Average Accuracy
Baseline 95.11% 54.81% 80.17% 96.76% 63.07% 70.52% 0.6035 0.9335 0.7300 77.99%
Stage-1 84.44% 32.51% 77.52% 97.45% 81.99% 72.51% 0.6132 0.9276 0.7162 74.78%
Stage-2 97.06% 58.49% 78.55% 97.64% 68.81% 73.84% 0.6452 0.9507 0.7532 80.11%

Split D Distribution and Patch-level Classifier Test Results

Training set

Data Type CC LGSC EC MC HGSC Total
Patient 21 9 18 6 50 104
Slide 37 21 29 8 97 192
Patch 19.30% 15.31% 11.21% 13.55% 40.63% 96661

Validation set

Data Type CC LGSC EC MC HGSC Total
Patient 5 2 5 1 13 26
Slide 9 5 13 1 35 63
Patch 14.55% 19.86% 16.62% 4.71% 44.26% 45865

Test set

Data Type CC LGSC EC MC HGSC Total
Patient 6 3 5 2 13 29
Slide 7 3 13 2 25 50
Patch 6.87% 12.82% 12.82% 12.84% 54.64% 18990

Baseline, Stage-1 and Stage-2 test results

Model CC LGSC EC MC HGSC Weighted Accuracy Kappa AUC F1 Score Average Accuracy
Baseline 64.29% 83.37% 48.71% 99.84% 80.76% 78.30% 0.6790 0.9473 0.7212 75.39%
Stage-1 76.55% 79.10% 40.04% 98.24% 87.14% 80.77% 0.7056 0.9423 0.7383 76.21%
Stage-2 65.06% 84.11% 33.76% 95.41% 88.33% 80.10% 0.6881 0.9277 0.7261 73.33%

Split E Distribution and Patch-level Classifier Test Results

Training set

Data Type CC LGSC EC MC HGSC Total
Patient 21 9 18 6 50 104
Slide 33 21 41 7 103 205
Patch 17.16% 19.44% 13.73% 9.38% 40.30% 105482

Validation set

Data Type CC LGSC EC MC HGSC Total
Patient 6 3 5 2 13 29
Slide 11 3 7 3 27 51
Patch 12.62% 10.19% 11.73% 39.90% 25.57% 16690

Test set

Data Type CC LGSC EC MC HGSC Total
Patient 5 2 5 1 13 26
Slide 9 5 7 1 27 49
Patch 16.33% 10.54% 11.32% 2.91% 58.91% 39344

Baseline, Stage-1 and Stage-2 test results

Model CC LGSC EC MC HGSC Weighted Accuracy Kappa AUC F1 Score Average Accuracy
Baseline 79.29% 88.15% 26.77% 99.74% 65.01% 66.47% 0.5003 0.9149 0.6336 71.79%
Stage-1 61.00% 95.37% 58.06% 98.78% 68.85% 70.01% 0.5490 0.9371 0.7058 76.41%
Stage-2 70.92% 90.98% 47.80% 97.81% 66.74% 68.74% 0.5285 0.9156 0.7067 74.85%

Split F Distribution and Patch-level Classifier Test Results

Training set

Data Type CC LGSC EC MC HGSC Total
Patient 21 9 18 6 50 104
Slide 33 21 41 7 103 205
Patch 17.16% 19.44% 13.73% 9.38% 40.30% 105482

Validation set

Data Type CC LGSC EC MC HGSC Total
Patient 5 2 5 1 13 26
Slide 9 5 7 1 27 49
Patch 16.33% 10.54% 11.32% 2.91% 58.91% 39344

Test set

Data Type CC LGSC EC MC HGSC Total
Patient 6 3 5 2 13 29
Slide 11 3 7 3 27 51
Patch 12.62% 10.19% 11.73% 39.90% 25.57% 16690

Baseline, Stage-1 and Stage-2 test results

Model CC LGSC EC MC HGSC Weighted Accuracy Kappa AUC F1 Score Average Accuracy
Baseline 94.78% 31.24% 65.25% 39.14% 51.64% 51.61% 0.3993 0.8751 0.5124 56.41%
Stage-1 95.58% 33.35% 90.04% 33.88% 58.13% 54.40% 0.4404 0.8893 0.5464 62.20%
Stage-2 95.96% 28.12% 53.65% 41.63% 75.05% 57.06% 0.4615 0.8287 0.5492 58.88%