/SlideSeg3

A Python module that produces image patches and annotation masks from whole slide images for deep learning in digital pathology.

Primary LanguageJupyter NotebookMIT LicenseMIT

SlideSeg3

Author: Brendan Crabb brendancrabb8388@pointloma.edu
Modified Sep 12, 2019


Welcome to SlideSeg3, a python3 module modified from SlideSeg that allows you to segment whole slide images into usable image chips for deep learning. Image masks for each chip are generated from associated markup and annotation files.

If you use this code for research purposes, please cite the following in your paper:

Brendan Crabb, Niels Olson, "SlideSeg: a Python module for the creation of annotated image repositories from whole slide images", Proc. SPIE 10581, Medical Imaging 2018: Digital Pathology, 105811C (6 March 2018); doi: 10.1117/12.2300262; https://doi.org/10.1117/12.2300262

Usage

  1. Environment
  2. Setup
    2.1 Parameters
    2.2 Annotation Key
  3. Run
  4. References

1. Environment

Go to main directory

1.1 Creating environment from .yml file

conda env create -f environment_slideseg3.yml

Creating the environment might take a few minutes. Once finished, issue the following command to activate the environment:

  • Windows: activate SlideSeg3
  • macOS and Linux: source activate SlideSeg3

If the environment was activated successfully, you should see (SlideSeg3) at the beggining of the command prompt.

OpenSlide and OpenCV are C libraries; as a result, they have to be installed separately from the conda environment, which contains all of the python dependencies.

2. Setup

Create a folder called 'images/' in the main directory and copy all of the slide images into this folder. Create a folder called 'xml/' in the main directory copy the markup and annotation files (in .xml format) into this folder. It is important that the annotation files have the same file name as the slide they are associated with.

2.1 Parameters

Set parameters in Parameters.txt

slide_path: Path to the folder of slide images

xml_path: Path to the folder of xml files

output_dir: Path to the output folder where image_chips, image_masks, and text_files will be saved

format: Output format of the image_chips and image_masks (png or jpg only)

quality: Output quality: JPEG compression if output format is 'jpg' (100 recommended,jpg compression artifacts will distort image segmentation)

size: Size of image_chips and image_masks in pixels

overlap: Pixel overlap between image chips

key: The text file containing annotation keys and color codes

save_all: True saves every image_chip, False only saves chips containing an annotated pixel

save_ratio: Ratio of image_chips containing annotations to image_chips not containing annotations (use 'inf' if only annotated chips are desired; only applicable if save_all == False

level: Choose from highest (highest magnification), all, lowest (lowest magnification), 40.0, 20.0, 10.0, 5.0, 2.5, 1.25 if no specific magnification created by manufactory will use lower magnification. e.g 40x->20x

cpus: Number of CPUs to be used to parallel multiple WSIs, if processing all levels, less then 4 cpus will be recommanded in case of memory lack.

2.2 Annotation Key

The main directory should already contain an Annotation_Key.txt file. If no Annotation_Key file is present, one will be generated automatically from the annotation files in the xml folder.

The Annotation_Key file contains every annotation key with its associated color code. In all image masks, annotations with that key will have the specified pixel value. If an unknown key is encountered, it will be given a pixel value and added to the Annotation_Key automatically.

3. Run

Once in SlideSeg3 environment, run the python script 'main.py'

4. References

https://github.com/btcrabb/SlideSeg