PhenoTIL in general is the phenotypic features from Tumor-Infiltrating Lymphocytes (TIL) on H&E images. It is a multimodal pipline that ultimetly builds a immune-related biomarker that can be associated with overall survival of lung cancer patients. It consists of different modalities from MATLAB, Python to R. Each offering an specific solution. This work is part of a upcoming paper.
Main aspects of PhenoTIL:
- Nuclei segmentation with lymphocyte identification and feature extraction of immune cell morphological aspects cell by cell.
- Feature analysis of immune cells with unsupervized clustering.
- High quality visualization of statistical analysis of the signature.
List of major frameworks/libraries used to build the project.
PhenoTIL consists of three segments.
- The initial step is the preprocessing of whole-slide images (WSI), then tile generation using MATLAB and first nuclei segmentation #1 using deep learning (DL) U-Net.
- The preprocessing of the H&E image with nuclei segmentation using machine learning (ML) and identification of lymphocyte cells. This also includes the feature extraction of the identified cells (done in MATLAB).
- The unsupervised clusterization of the extracted features (done in Python).
- The visualization of some statistical scripts implemented in the paper (done in R).
HistoQC is an open-source quality control tool for digital pathology slides, it can be run using Python. The project can be found here HistoQC Once the WSI images are processed, different steps can be performed.
A sample WSI can be found in this link wsi sample
To extract small tiles samples from WSI we perform tile generation
. MATLAB dependencies are already provided in the code/tilegen/
folder. Other dependencies are related to the toolbox offered with MATLAB.
Further dependency is the openslide libraries that are needed to process WSI images. They are provided in the folder but if issue are found (e.g. running on Linux), the library can be found with the below links. The script was run using MATLAB2022a (Academic Use).
The script run_phenoTIL_WSI_tileGeneration
was done using the Live Script Option from MATLAB. It has few steps to perform tile generation and also create a binary mask from annotations done for isolating a tissue area (e.g. tumor area).
- To run the tile generation, the lines of code can be found on the main script as:
We add the dependencies
Then we simply run the script as
addpath(genpath('./code/tilegen/')) addpath(genpath('./code/tilegen/openslide-3.4.1')) addpath(genpath('./code/tilegen/libs_openslide')) addpath(genpath('./code/tilegen/openslide-matlab'))
mainTileGenerationV2 input_path_image input_path_annotation output_path_tiles image_format
mainTileGenerationV2 './data/test_set/wsi/' './data/test_set/wsi/' './output/matlab/tiles/' 'tiff'
The pre-trained model can be found in this link model folder. It can be added to the folder PhenoTIL_V1/model
.
- As the codes are were written in Python 3.8 at the time, to reproduce the nuclei segmentation, we create a conda
environment
:We then activate the environmentconda create -n nucleipy python=3.8
activate nucleipy
- We then install the old version of the opencv-python, Pillow and others:
pip install -r requirements_nucleiSeg.txt pip install --upgrade pip pip install --upgrade tensorflow pip install opencv-python pip install Pillow
- We run the script, indicating the input as the folder containing the image (png format) and output the folder to save the mask.
python run_phenoTIL_python38_nucleiSegmentationDL.py input_path output_path
Activate the environmentRun the scriptactivate nucleipy
python run_phenoTIL_python38_nucleiSegmentationDL.py '/data/test_set/' '/output/python/'
- The results can be seen on the folder directory
phenoTIL_V1/output/python/
including the nuclei segmentation #1test_mask.png
MATLAB dependencies are already provided in the code
folder. Other dependencies are related to the toolbox offered with MATLAB.
The script was run using MATLAB2022a (Academic Use).
To run the script we follow the next steps:
- We open MATLAB and locate the directory in the same as the phenoTIL main folder.
- We run the script on MATLAB as shown below. It will run the script using the nuclei segmentation #1 mask:
Script
More specifically, the nuclei segmentation line is:
run_phenoTIL_matlabr2020b_featureExtraction
For the lymphocyte identification the line is:% input are (RGB image, color normalization (1=yes), lower value for the scales, upper value for the scales (2,4,8,10,12)) nuclei = getWatershedMask(img,1,4,12);
The line of code for the extraction of phenoTIL features is:% To get the lymphocytes from the nuclei mask and image we run it as % load the trained lymphocyte model lympModel = load('lymp_svm_matlab_wsi.mat'); lympModel = lympModel.model; % Extract local (shape, size, intensity) features [nucleiCentroids,feat_simplenuclei] = get_localcellfeatures(img,nuclei); % Identify which is lymphocyte and which is not isLymphocyte = (predict(lympModel,feat_simplenuclei(:,1:7)))==1; % Identify the centroids of lymphocytes and non-lymphocytes lympCentroids=nucleiCentroids(isLymphocyte==1,:); nonLympCentroids=nucleiCentroids(isLymphocyte~=1,:); % Represent them as a binary mask rndLymp = round(lympCentroids); rndnonLymp = round(nonLympCentroids); bwLymp = bwselect(nuclei,rndLymp(:,1),rndLymp(:,2)); bwnonLymp = bwselect(nuclei,rndnonLymp(:,1),rndnonLymp(:,2)); % save the lymphocyte mask imwrite(bwLymp,'./output/matlab/test_lymp.png') imwrite(bwnonLymp,'./output/matlab/test_nonlymp.png')
It requires only the input path of the H&E images and the output path where the features are being saved.getAllFeatures_V2(input_path,output_path);
- (Optional) If the nuclei segmentation #1 mask is already saved. We can combine the nuclei segmentation #1 with nuclei segmentation #2. It offers more options to identify cells on a image sample.
We run a set of line of codes (A script can be found at
/code/libs/fusion_masks.m
:BW_ml = imread('./output/matlab/test_mask_ml.png'); % Load the mask from nuclei segmentation #1 (ML) BW = imread('./output/python/test_mask_dl.png'); % Load the mask from nuclei segmentation #2 (DL) BW = imbinarize(BW,'adaptive'); % Make it binary BW = bwareaopen(BW, 30); % Remove small objects detected BW = bwpropfilt(BW,'Area',[0 200]); % Remove bigger objects (e.g. grouped cells) BW_comb = BW + logical(BW_ml); % Combine both masks imwrite(BW_comb,'./output/python/test_mask_combined.png'); % Save the combined binary mask
- The results can be seen on the folder directory
phenoTIL_V1/output/matlab/
including testing images. Thetest.mat
file saved is the file with the morphometrical features for each of the identified lymphocyte cells. It will be used for clusterization on the Python script.
-
As the codes were written in Python 2.7 at the time, to reproduce the feature extraction, we create a conda
environment
:conda create --name phenotil_py2 python=2.7
We then activate the environment
conda activate phenotil_py2
-
We then install the old version of the sklearn, numpy and others:
pip install numpy==1.16.4 pip install "scikit-learn==0.19.0" pip install pillow pip install pandas pip install matplotlib pip install sio pip install scipy pip install joblib pip install hdf5storage
-
To run the Python script we simply run the code once the conda environment is activated:
python run_phenoTIL_python27_featureClustering.py
-
The cluster of the cells file is saved as
phenoTIL_V1/output/python/test_cls.mat
For the depndencies make sure that the following libraries are installed with R.
-
For hrbrthemes, please follow installs from: https://github.com/hrbrmstr/hrbrthemes
remotes::install_github("hrbrmstr/hrbrthemes")
-
The needed libraries are below:
plyr R.matlab survcomp Gmisc skimr Hmisc boot table1 survival survminer gsubfn ggplot2 ggnetwork ggforce waffle ggpubr uwot gridExtra grid cowplot lattice ggsci tidyverse colourlovers RColorBrewer hexbin viridis patchwork hrbrthemes circlize chorddiag TCGAWorkflowData DT TCGAbiolinks ggcorrplot ComplexHeatmap colorspace GetoptLong caret pheatmap EDASeq GISTools
-
Also for some libraries can be installed as:
devtools::install_github("mattflor/chorddiag")
install.packages("GISTools")
For running the visualization scripts on R, run the script:
run_phenoTIL_R_FigurePlot.R
The resulting images will be plotted in the R environment. Some examples are saved at /output/R/Rplots.pdf
Some screenshots and image generated can be observe below.
The original H&E image sample and the figure plot for the identified cells (Green are the lymphocyte cells, Red the non-lymphocyte cells)
The example plots generated by R. (left) The plot of the clustered cells and (right) chord diagram plot of immune cell composition of the clusters.
- Update the README
- Add back running scrips
- Add Additional Templates w/ Examples
- Add "components" document to easily copy & paste sections of the readme
- Multi-language Support
- Spanish
Future uses will be added once are found or observed.
Distributed under the MIT License. See LICENSE.txt
for more information.
Cristian Barrera - cbarrera31@gatech.edu
Project Link: https://github.com/maberyick/PhenoTIL_V1