/MIL-pathology

Primary LanguageJupyter Notebook

Refining Coarse Annotations on Single Whole-Slide Image

Detailed and exhausitive annotations on whole-slide images (WSI) are extremely labor-intensive and time-consuming. In this repository, we provide implementation of two methods -- (1) Deep k-NN (DkNN); (2) Label Cleaning Multiple Instance Learning (LC-MIL) -- for refining these coarse annotations, and producing a more accurate version of annotations. The figure below shows an example of the coarse annotations and the refined annotations produced by one of our method (LC-MIL). image.png Noticeably, although both methods are machine learning based, the refinement can be conducted on each single slide, and NO externel data is needed.

Dataset

Usage

DkNN

Training

python train_model.py 

positional arguments:
  slide_root            The location of the whole-slide image.
  slide_ID              Name of the whole-slide image.
  slide_format          Dataformat the whole-slide image. Permitted format can
                        be `.svs`, `,ndpi`, and `.tif`.
  ca_path               The path to the coarse annotations. File format should
                        be `.sav`
  model_save_root       Where model will be saved
  
optional arguments:
  -h, --help            show this help message and exit
  --remove_blank REMOVE_BLANK
                        How to remove blank regions (i.e. identify tissue
                        regions) of WSI. We provide three functions: 0:
                        convert to HSV and then OTSU threshold on H and S
                        channel; 1: apply [235, 210, 235] on RGB channel,
                        respecitively; 2: convert to gray image, then OTSU
                        threshold. Default is 0. For new dataset, the user is
                        encouraged to write customed function
  --focal_loss FOCAL_LOSS
                        Whether or not to use focal loss (True: using focal
                        loss; Flase: using cross entropy), default is false
  --patch_shape PATCH_SHAPE
                        Patch shape(size), default is 256
  --unit UNIT           Samllest unit when cropping patches, default is 256
  --gpu GPU             gpu
  --lr LR               Initial Learning rate, default is 0.00005
  --step_size STEP_SIZE
                        Step size when decay learning rate, default is 1
  --reg REG             Reg,default is 10e-5

Applying/Inference

python apply_model.py 

positional arguments:
  slide_root            The location of the whole-slide image.
  slide_ID              Name of the whole-slide image.
  slide_format          Dataformat the whole-slide image. Permitted format can
                        be `.svs`, `,ndpi`, and `.tif`.
  ca_path               The path to the coarse annotations. File format should
                        be `.sav`
  model_dir             Where to load the model (to conduct feature
                        extraction)
  feature_save_root     Where the mapped features will be saved
  knn_save_root         Where the KNN results (distance and index) will be
                        saved
  heatmap_save_root     Where the predicted heatmap will be saved

optional arguments:
  -h, --help            show this help message and exit
  --remove_blank REMOVE_BLANK
                        How to remove blank regions (i.e. identify tissue
                        regions) of WSI. We provide three functions: 0:
                        convert to HSV and then OTSU threshold on H and S
                        channel; 1: apply [235, 210, 235] on RGB channel,
                        respecitively; 2: convert to gray image, then OTSU
                        threshold. Default is 0. For new dataset, the user is
                        encouraged to write customed function
  --focal_loss FOCAL_LOSS
                        Whether or not to use focal loss (True: using focal
                        loss; Flase: using cross entropy), default is False
  --patch_shape PATCH_SHAPE
                        Patch shape(size), default is 256
  --unit UNIT           Samllest unit when cropping patches, default is 256
  --gpu GPU             gpu

Template command

cd DkNN
python train_model.py ../Data test_016 .tif ../coarse_annotations.sav . 
python apply_model.py ../Data test_016 .tif ../coarse_annotations.sav model_test_016.pth . . . 

We can not actually upload our test WSI, test_016.tif to this repository due to the space limit of Github, but you can find it in the google drive

LC-MIL

Training

python train_model.py 

positional arguments:
  slide_root            The location of the whole-slide image.
  slide_ID              Name of the whole-slide image.
  slide_format          Dataformat the whole-slide image. Permitted format can
                        be `.svs`, `,ndpi`, and `.tif`.
  ca_path               The path to the coarse annotations. File format should
                        be `.sav`
  model_save_root       Where model will be saved

optional arguments:
  -h, --help            show this help message and exit
  --remove_blank REMOVE_BLANK
                        How to remove blank regions (i.e. identify tissue
                        regions) of WSI. We provide three functions: 0:
                        convert to HSV and then OTSU threshold on H and S
                        channel; 1: apply [235, 210, 235] on RGB channel,
                        respecitively; 2: convert to gray image, then OTSU
                        threshold. Default is 0. For new dataset, the user is
                        encouraged to write customed function
  --length_bag_mean LENGTH_BAG_MEAN
                        Average length of bag (Binomial distribution),default
                        = 10
  --num_bags NUM_BAGS   Number of bags to train,default = 1000
  --focal_loss FOCAL_LOSS
                        Whether or not to use focal loss (True: using focal
                        loss; Flase: using cross entropy), default is FL
  --patch_shape PATCH_SHAPE
                        Patch shape(size), default is 256
  --unit UNIT           Samllest unit when cropping patches, default is 256
  --gpu GPU             gpu
  --lr LR               Initial Learning rate, default is 0.00005
  --step_size STEP_SIZE
                        Step size when decay learning rate, default is 1
  --reg REG             Reg,default is 10e-5

Applying/Inference

python apply_model.py 

positional arguments:
  slide_root            The location of the whole-slide image.
  slide_ID              Name of the whole-slide image.
  slide_format          Dataformat the whole-slide image. Permitted format can
                        be `.svs`, `,ndpi`, and `.tif`.
  model_dir             The path to the MIL model
  heatmap_save_root     Where the predicted heatmap will be saved

optional arguments:
  -h, --help            show this help message and exit
  --remove_blank REMOVE_BLANK
                        How to remove blank regions (i.e. identify tissue
                        regions) of WSI. We provide three functions: 0:
                        convert to HSV and then OTSU threshold on H and S
                        channel; 1: apply [235, 210, 235] on RGB channel,
                        respecitively; 2: convert to gray image, then OTSU
                        threshold. Default is 0. For new dataset, the user is
                        encouraged to write customed function
  --length_bag_mean LENGTH_BAG_MEAN
                        Average length of bag (Binomial distribution),default
                        = 10
  --num_bags NUM_BAGS   Number of bags to train,default = 1000
  --focal_loss FOCAL_LOSS
                        Whether or not to use focal loss (True: using focal
                        loss; Flase: using cross entropy), default is FL
  --patch_shape PATCH_SHAPE
                        Patch shape(size), default is 256
  --unit UNIT           Samllest unit when cropping patches, default is 256
  --gpu GPU             gpu

Template command

cd LC_MIL
python train_model.py ../Data test_016 .tif ../coarse_annotations.sav . 
python apply_model.py ../Data test_016 .tif model_test_016.pth . . . 

We can not actually upload our test WSI, test_016.tif to this repository due to the space limit of Github, but you can find it in the google drive

Post-processing

Post-processing procedure for both methods (DkNN and LC-MIL), and the illustration can be found in Post-process.ipynb.

Publication

@misc{wang2021label,
      title={Label Cleaning Multiple Instance Learning: Refining Coarse Annotations on Single Whole-Slide Images}, 
      author={Zhenzhen Wang and Aleksander S. Popel and Jeremias Sulam},
      year={2021},
      eprint={2109.10778},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}