@misc{wang2020classification,
title={Classification of Epithelial Ovarian Carcinoma Whole-Slide Pathology Images Using Deep Transfer Learning},
author={Yiping Wang and David Farnell and Hossein Farahani and Mitchell Nursey and Basile Tessier-Cloutier and Steven J. M. Jones and David G. Huntsman and C. Blake Gilks and Ali Bashashati},
year={2020},
eprint={2005.10957},
archivePrefix={arXiv},
primaryClass={eess.IV}
}
mkdir wsi_classification
cd wsi_classification
git clone https://github.com/AIMLab-UBC/MIDL2020
cd MIDL2020
Repo Structure
patch_level.py is the patch-level classifier entry. Inside this file, function train initialize models, create data loader, optimize model weights, log training information, etc. evaluate simply loads the trained model and apply the model on validation or testing sets.
slide_level.py is the slide-level classifier entry. Inside this file, function main initialize models, create a data loader, optimize model weights, and compute the 6-fold cross-validation.
config.py is the program that reads arguments from the user. Therefore, you can custom any hyperparameters or settings through config.py
models is the sub-module that contains implementation involves models.
Inside models sub-module, models/base_model.py is the template class for models/models.py. It defines various expected behaviours of a model, such as forward, optimize_weights, load_state, save_state, etc. Any models should inherit BaseModel.
Inside models sub-module, models/models.py is the models/networks.py interface. It initializes models/networks.py and optimizes the weights of models/networks.py. Different models in models/models.py are summarized in the following table.
Models
Usage
CountBasedFusionModel
Slide-Level model wrapper. It assumes an input of N*C matrix, where N represents the number of slides, C represents the number of classes.
DeepModel
Patch-Level model wrapper. It is a simple wrapper for out-of-box deep learning models. It takes one patch for each forward pass. It follows the standard deep learning training protocol.
Inside models sub-module, models/networks.py is the implementation of various networks. It simply defines network architecture and forward functions. Keep it as simple as possible.
data is the sub-module that contains data loader, preprocess, and post-process functions.
Inside the data sub-module, data/base_dataset.py is the template class for other data loaders. It simply defines data loader behaviour, and assign config.py settings into the current data loader. Moreover, it also modifies the patch ids if requires, such as change multi-scale model scales, etc.
Inside the data sub-module, data/patch_dataset.py is the main patch images or patch features data loader.
Dataset
Usage
SubtypePatchDataset
It loads patch images from H5 files and applies preprocess steps.
Inside the data sub-module, data/create_patient_groups.py is the helper script to split the dataset by patients.
utils is the sub-module that contains simple but useful function snippet.
Inside utils sub-module, utils/utils.py contains a wide range of useful small functions. Contributors should also add useful functions in here, and add docstring. Moreover, functions in here should be self-explained and as general as possible.
Inside utils sub-module, utils/subtype_enum.py defines enum for classes.
Patch extraction
First of all, the enum in utils/subtype_enum.py should be defined.
Afterwards, we extract the 1024 * 1024 patches and then downsampled to 512 * 512 and 256 * 256 using the extract_patches.py script. This script not only extract patches but also store the patches into an H5 file for easy data transfer and management. However, we use our data annotation file so the annotation parse and check portion need to change for other datasets.
We store the patches in the h5 files who has the format subtype_name/slide_id/patch_locaton_x_y and use .txt files to store the data entry ids.
Patch-level: train, validation and test
The following bash script is used to invoke training, validation and test:
We train Random Forests using 6-fold cross-validation on the results of six patch-level test set.
After changing the path to the six patch-level results in the slide_level.py, simply run python3 slide_level.py and it will output the slide-level results as well as save the trained model.
Our results
We include our patch-level and slide-level results in ./results/.
Datasets and Detailed Results
The epithelial ovarian carcinoma whole-slide pathology images used in this study will be available upon institutional review board approvals at a later date. We will update this page once the dataset is availablle.
Our dataset has the following distribution in terms of patients, slides, and 1024 * 1024 tumor patches:
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
32
14
28
9
76
159
Slide
53
29
55
11
157
305
Patch
16.49%
16.31%
12.93%
10.96%
43.31%
161516
We first randomly divided the datasets by patients into three groups. We denote these three groups as Group 1, Group 2, and Group 3.
Group 1 has the following distributions:
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
11
5
10
3
26
55
Slide
20
8
14
4
54
100
Patch
15.23%
10.43%
11.44%
13.93%
48.98%
56034
Group 2 has the following distributions:
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
11
5
10
3
26
55
Slide
16
8
26
3
60
113
Patch
12.30%
17.80%
15.51%
7.09%
47.30%
64855
Group 3 has the following distributions:
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
10
4
8
3
24
49
Slide
17
13
15
4
43
92
Patch
24.91%
22.05%
10.89%
13.03%
29.12%
40627
Slide-level and Patch-Level Results
For slide-level classification, we only use the patch-level test set results (as shown in the above figure) to build the input matrix to train random forests, and we report the 6-fold cross-validation slide-level results.
Split
CC
LGSC
EC
MC
HGSC
Weighted Accuracy
Kappa
AUC
F1 Score
Average Accuracy
baseline 6-fold cross-validation
83.02%
65.52%
54.55%
54.55%
80.25%
73.77%
0.5993
0.9391
0.6855
67.58%
stage-1 6-fold cross-validation
79.25%
79.31%
61.82%
54.55%
85.99%
78.69%
0.6730
0.9375
0.7414
72.18%
stage-2 6-fold cross-validation
86.79%
100.00%
74.55%
81.82%
90.45%
87.54%
0.8106
0.9641
0.8718
86.72%
For patch-level classification, we employ a 3-fold cross-validation scheme with a tweak. We use two of three patient groups as the training set and divide the remaining group equally by patient into two subgroups, one of the subgroups will be used as validation or test set. Therefore, we eventually have 6 different training, validation and test set.
Split A Distribution and Patch-level Classifier Test Results
Training set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
22
10
20
6
52
110
Slide
36
16
40
7
114
213
Patch
13.66%
14.39%
13.62%
10.26%
48.08%
120889
Validation set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
5
2
4
2
12
25
Slide
7
3
9
3
19
41
Patch
17.31%
27.79%
7.85%
31.73%
15.33%
14594
Test set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
5
2
4
1
12
24
Slide
10
10
6
1
24
51
Patch
29.18%
18.83%
12.59%
2.55%
36.85%
26033
Baseline, Stage-1 and Stage-2 test results
Model
CC
LGSC
EC
MC
HGSC
Weighted Accuracy
Kappa
AUC
F1 Score
Average Accuracy
Baseline
99.22%
89.60%
73.89%
100.00%
75.05%
85.33%
0.8015
0.9739
0.8546
87.55%
Stage-1
99.42%
79.05%
63.58%
99.85%
74.70%
81.97%
0.7543
0.9651
0.8105
83.32%
Stage-2
99.50%
77.34%
72.70%
99.55%
72.64%
82.05%
0.7568
0.9658
0.8243
84.34%
Split B Distribution and Patch-level Classifier Test Results
Training set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
22
10
20
6
52
110
Slide
36
16
40
7
114
213
Patch
13.66%
14.39%
13.62%
10.26%
48.08%
120889
Validation set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
5
2
4
1
12
24
Slide
10
10
6
1
24
51
Patch
29.18%
18.83%
12.59%
2.55%
36.85%
26033
Test set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
5
2
4
2
12
25
Slide
7
3
9
3
19
41
Patch
17.31%
27.79%
7.85%
31.73%
15.33%
14594
Baseline, Stage-1 and Stage-2 test results
Model
CC
LGSC
EC
MC
HGSC
Weighted Accuracy
Kappa
AUC
F1 Score
Average Accuracy
Baseline
89.90%
76.58%
81.05%
20.89%
58.78%
58.84%
0.4986
0.8969
0.5698
65.44%
Stage-1
89.59%
74.04%
75.72%
44.58%
50.38%
63.89%
0.5521
0.8997
0.6122
66.86%
Stage-2
86.26%
64.32%
63.23%
37.93%
60.84%
59.13%
0.4965
0.8823
0.5711
62.52%
Split C Distribution and Patch-level Classifier Test Results
Training set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
21
9
18
6
50
104
Slide
37
21
29
8
97
192
Patch
19.30%
15.31%
11.21%
13.55%
40.63%
96661
Validation set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
6
3
5
2
13
29
Slide
7
3
13
2
25
50
Patch
6.87%
12.82%
12.82%
12.84%
54.64%
18990
Test set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
5
2
5
1
13
26
Slide
9
5
13
1
35
63
Patch
14.55%
19.86%
16.62%
4.71%
44.26%
45865
Baseline, Stage-1 and Stage-2 test results
Model
CC
LGSC
EC
MC
HGSC
Weighted Accuracy
Kappa
AUC
F1 Score
Average Accuracy
Baseline
95.11%
54.81%
80.17%
96.76%
63.07%
70.52%
0.6035
0.9335
0.7300
77.99%
Stage-1
84.44%
32.51%
77.52%
97.45%
81.99%
72.51%
0.6132
0.9276
0.7162
74.78%
Stage-2
97.06%
58.49%
78.55%
97.64%
68.81%
73.84%
0.6452
0.9507
0.7532
80.11%
Split D Distribution and Patch-level Classifier Test Results
Training set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
21
9
18
6
50
104
Slide
37
21
29
8
97
192
Patch
19.30%
15.31%
11.21%
13.55%
40.63%
96661
Validation set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
5
2
5
1
13
26
Slide
9
5
13
1
35
63
Patch
14.55%
19.86%
16.62%
4.71%
44.26%
45865
Test set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
6
3
5
2
13
29
Slide
7
3
13
2
25
50
Patch
6.87%
12.82%
12.82%
12.84%
54.64%
18990
Baseline, Stage-1 and Stage-2 test results
Model
CC
LGSC
EC
MC
HGSC
Weighted Accuracy
Kappa
AUC
F1 Score
Average Accuracy
Baseline
64.29%
83.37%
48.71%
99.84%
80.76%
78.30%
0.6790
0.9473
0.7212
75.39%
Stage-1
76.55%
79.10%
40.04%
98.24%
87.14%
80.77%
0.7056
0.9423
0.7383
76.21%
Stage-2
65.06%
84.11%
33.76%
95.41%
88.33%
80.10%
0.6881
0.9277
0.7261
73.33%
Split E Distribution and Patch-level Classifier Test Results
Training set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
21
9
18
6
50
104
Slide
33
21
41
7
103
205
Patch
17.16%
19.44%
13.73%
9.38%
40.30%
105482
Validation set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
6
3
5
2
13
29
Slide
11
3
7
3
27
51
Patch
12.62%
10.19%
11.73%
39.90%
25.57%
16690
Test set
Data Type
CC
LGSC
EC
MC
HGSC
Total
Patient
5
2
5
1
13
26
Slide
9
5
7
1
27
49
Patch
16.33%
10.54%
11.32%
2.91%
58.91%
39344
Baseline, Stage-1 and Stage-2 test results
Model
CC
LGSC
EC
MC
HGSC
Weighted Accuracy
Kappa
AUC
F1 Score
Average Accuracy
Baseline
79.29%
88.15%
26.77%
99.74%
65.01%
66.47%
0.5003
0.9149
0.6336
71.79%
Stage-1
61.00%
95.37%
58.06%
98.78%
68.85%
70.01%
0.5490
0.9371
0.7058
76.41%
Stage-2
70.92%
90.98%
47.80%
97.81%
66.74%
68.74%
0.5285
0.9156
0.7067
74.85%
Split F Distribution and Patch-level Classifier Test Results