Create model multiple classes
pr4deepr opened this issue ยท 19 comments
Hi @andreped
Thanks for this great software and workflow.
I would like to generate a model which classifies the epithelia in the colon from WSIs into villi (usually at the boundary) and crypts (circular structures). Essentially, there would be 3 classes,
- background
- villi and
- crypts
My plan is to use your models to generate annotations from my WSIs, import into QuPath, correct them, and then split annotations into different classes.
How do I configure image export for 3 different classes from QuPath, train in MIB and then run on FastPathology?
Is it possible to do this and is this a good workflow idea?
Cheers
Pradeep
The current implementation does not have multi-class support. The reason for this was that it was challenging to make this generic for all the scripts in the pipeline. However, this is all related to QuPath.
Both MIB and FastPathology supports multi-class without any issues. Same goes for QuPath really, but the current scripts do not.
Your idea seems fine. @SahPet has worked a lot with multi-class segmentation using our pipeline, however, we have not made his implementation available yet, but we could prioritize this, if you'd like.
For now, just to get everything up and running and getting familiar with the pipeline, I would start with doing binary segmentation (that is neglecting one of your classes, and doing background villi/crypts - do not merge the villi/crypts classes, I dont think that makes a lot of sense, in your case. However, if you were to do multi-class nuclei segmentation, merging all types of nuclei into a single class, likely makes sense).
For getting familiar with the workflow, I would recommend following the tutorial video, especially if you get any issues somewhere. And of course, feel free to ask questions! Happy to help :]
Thanks @andreped
Any sort of support for this would be appreciated.
If you do have somewhat of a pipeline I'd be very interested in using it.
Currently, we're interested in segmenting regions of the tissue, followed by downstream analysis which look at multi class nuclei or just look at % staining in the area of interest. The common starting point is the need to segment the villi/crypts.
Thanks for the suggestion about generating training data. I'll get started on thid.
Cheers
Pradeep
Cheers
Pradeep
Yes, hopefully @andreped will be able to add multi class support in the Qupath import scripts. Just a note about the dataset. These are colon biopsies, and not small intestine biopsies, so they do not have any villi. Did you mean "surface epithelium" and "crypts"?
Also, as @andreped said, it's possible to do this with MIB only, which support multi class annotation and training (Andrรจ said your images were quite small, so a proper wsi reader like Qupath might not be necessary).
For an introduction on how to use MIB/DeepMIB for training, a good introductory tutorial from Ilya Belevich who created the software is available here
https://youtu.be/gk1GK_hWuGE
and here
https://youtu.be/iG_wsxniBKk
Currently, we're interested in segmenting regions of the tissue, followed by downstream analysis which look at multi class nuclei or just look at % staining in the area of interest. The common starting point is the need to segment the villi/crypts.
What you are mentioning here is actually something @SahPet is interesting in having as well, and something we are considering to add as a feature in FastPathology directly. However, it is challenging to generalize, and it might be better to deploy the model on your WSIs in FastPathology, export the results, and import them in QuPath for doing this type of analysis. @SahPet perhaps time that you started writing applications again? ;)
Also, regarding refining annotations (active learning pipeline), it is possible to do so using only MIB, if it is feasible to annotate based on only patch-level information. However, QuPath is probably the best solution for this, as it might be beneficial to correct predictions there, as you are able to see the entire WSI, if needed. Then for deployment/inference of the models trained in MIB, I would recommend using FastPathology, but you can also export patch predictions using MIB, as seen in the tutorial video.
Thanks @SahPet
These are colon biopsies, and not small intestine biopsies, so they do not have any villi. Did you mean "surface epithelium" and "crypts"?
Yes, thats what I meant.. Its been years and I still get this wrong.. ๐
currently what I can do:
- I can export my annotations as multi-class label image from QuPath.
- From the videos both of you shared, it looks like we can do multi-class training in MIB.
- I realised I can modify the QuPath pyramidal tiff import script (old one) by changing the
threshold
parameter to import multi-class annotations
As you said:
Both MIB and FastPathology supports multi-class without any issues.
I just wanted to confirm that if I have a model that is generated from MIB, I can use the multi-class model within FastPathology for inference?
What do I need to change in the model config file for this?
Cheers
Pradeep
Yes, you can use a multi-class model with FastPathology without any issues. The export method should also support it, as it created a uint8 tiled, pyramidal TIFF, where each uint is assigned for each class. Hence, background = 0, Class1 = 1, Class2 = 2
, etc... And as you mentioned, if you are able to modify the script to support multi-class for import, then QuPath also supports importing these.
Given that you have a multi-class model, all you need to do is provide the new model and the new config file. The new config file which supports a 3-class segmentation model could like look this:
model_name:multiclass-512
name:multiclass-512
task:multiclass-512
problem:segmentation
resolution:high
magnification_level:10
mask_threshold:0.02
tissue_threshold:70
patch_overlap:0.05
batch_size:1
input_img_size_x:512
input_img_size_y:512
nb_channels:3
nb_classes:3
input_node:ImageInputLayer
output_node:Softmax_Layer_Transpose2
class_colors:0,0,255;0,255,0;255,0,0
class_names:Exterior;Class1;Class2
interpolation:0
pipeline:import;tissue_segmentation;batchgen;neural_network;stitch;render
batch_process:2
scale_factor:1.0f/1.0f
cpu:0
IE:TensorRT
Compare this config file to the original one to see what I changed:
https://github.com/andreped/NoCodeSeg/blob/main/source/example-model-config.txt
This config file is made for a model with 3 classes (see nb_classes:3
). The only class-related changed I did were:
nb_classes:3
class_colors:0,0,255;0,255,0;255,0,0
class_names:Exterior;Class1;Class2
For class_colors
I simply added another RGB triplet for the new class (set its color) and gave the new class a name. Also note that I provided new names for this pipeline, to avoid it being mixed with the original one (if you already have existing pipelines imported in FastPathology:
model_name:multiclass-512
name:multiclass-512
task:multiclass-512
Thanks a lot. This is extremely useful.
I'll be generating some annotation data and testing it out soon.
Will report back on how I go!
Question, with the model you've trained, how well does it work on WSIs of colon tissue stained with other IHC markers that may label different cell types?
@pr4deepr It works quite well if you use the CD3 trained network/dataset, but of course depending on which IHC you are using. We've tested it on one other marker and it worked well. The biggest challenge in general I think is the sensitivity of these networks to the color difference between scanners and labs. This is an issue we and others are working on (testing different color normalization methods before training, etc. @andreped has done some work on this).
Thanks for this.
Its good to know which network I should be focusing on.
Assuming there a need for "annotated datasets" from different labs and particularly different scanners??
@pr4deepr You do not necessarily need lots of data from a variety of different labs and scanners. But you need to train your model to be invariant to these. This can be done either doing data augmentation (e.g. color augmentation) or stain normalization. Often these are done together. For instance you might want to add artificial blurring to the images, to make the network more invariant to blurring effects. Or to make the network more invariant to rotations, you could make artificial copies by rotating the images. These augmentation tricks are all available in MIB.
Regarding stain normalization, it is more tricky. It is not necessary to do for all applications, as doing color augmentation might suffice, but in practice, at least for very complex and tricky tasks, doing normalization seems to be beneficial.
As MIB does not support stain normalization (at least not using the methods we would like to to use, such as Macenko), I have created a simple tool for stain normalization, which you could test if you'd like:
https://github.com/andreped/fast-stain-normalization
It runs on the GPU if you'd like, and can be deployed in parallel which makes it very efficient. It is also easy to use.
Especially for mitosis detection on data from different scanner we see a huge improvement in performance using stain normalization.
As everything I make, the solution is openly available and free-to-use.
However, if you use stain normalization in preprocessing, you have to do the same during deployment (that is, when you apply your trained model on new WSIs). It might still work, but it will perform worse if you do not use stain normalization during deployment, as the model expects this to be done in preprocessing.
FastPathology does not currently have such a solution available for use, even though we have one implementation in-house which you could test (when you get that far). I was considering benchmarking it first, before making it openly available for everyone. Not sure yet.
MIB also does not have a stain normalization method, so this you would have to do outside the softwares. My CLI is quite easy to use. Simply install and run. See here for how to use:
https://github.com/andreped/fast-stain-normalization#usage
However, MIB contains data augmentation methods you could try, which should make your network more invariant to data from different scanners. However, when it comes to stain type, doing too much might also degrade performance. So carefully experiment with hyperparameters and study the trade-offs. Focus on improving performance on the validation set.
Nonetheless, I often do HSV augmentations in my use cases, and have had great success doing so. This augmentation method is available in MIB. So perhaps you could start by testing that first. Then you do not need all this stain normalization stuff, which adds a whole layer of complexity to your pipeline.
Thanks for this Andre. Appreciate the pointer to all these resources.
It makes sense to use the augmentation techniques, especially the "color augmentation" as you mentioned.
We've been running the FastPathology models and realised that separating out the hematoxylin channel and running the CD3 models on these channels give really good results. This should be a good start for generating annotations as well. I'll definitely be using your pointers when we get to the training side of things.
Hope you all have a good break!
Likewise! We should add some color deconvolution method to extract the H and E channels to FAST as well in the future.
As it does not have such a method now, for training your new models, just use the raw data as input and do color augmentation during training. Should work well for this task.
Just remember that those methods (color deconvolution and/or normalization) tend to have some numerical instabilities when the H and/or E channels are not present. Same applies to the color normalization methods, such as Macenko.
Remember to take a break as well! Happy holidays
@pr4deepr Hello! Hope you are doing well.
I was just going to say, that I have been working on adding some new features to FP. One of the key features are support for batch mode and possibility to run generic, modifiable pipelines (including option to perform stain normalization in preprocessing and a lot more!).
@SahPet is currently testing this solution now, and it seems to be working well. Some bugs to be fixed, but at least it runs on his machine and setup. He is also going to test multi-class models, and for that we were thinking of adding multi-class support for the import scripts in QuPath.
You mentioned that you got multi-class running by modifying the script? Are you able to share these modifications, and I could see if they are generic enough to be added. If so, you could either make a pull request or I could add these for you :]
Thanks for this!
A batch mode to run through an images in a folder would be great for FP.
Even better if we don't need to write code!
I made a pull request but its for exporting multiclass tiles. You may already have that sorted.
With importing multiclass tiles,
- I had to re-run the script and change the className everytime.
- I changed the masksPath to the new masks as well.
I didn't spend as much time on this as I don't have any multi-class predictions yet!
@pr4deepr I will accept this PR, as you made a new script, but I will attempt to merge this into the main script, such that there is only one script to for export in the future.
I will look into adding multi-class support for all scripts quite soon
@pr4deepr if you still have not been able to download the entire dataset, I am currently publishing it on a personal Google Drive. At least until the dataverseNO team discover why people are having issues.
The dataset will be fully uploaded in a few hours, but parts of it is already published. To access the data go here: https://drive.google.com/drive/folders/1eUVs1DA1UYayUYjr8_aY3O5xDgV1uLvH?usp=sharing
As you had problems, I have informed other potential users of the new alternative solution to download the data in the data section of the README. Thank you for informing me of this issue :]
@pr4deepr Zenodo was one of the first solutions I considered. However, AFAIK they only accept data up to 50 GBs per repo, which was just too low for our purpose. In addition, we wanted to find a platform we could use for future datasets as well. DataverseNO seemed like the only option at the time and it is also free to upload (at least for us with affiliation to NTNU). We are also frequently in contact with them, to improve their solution, which I really liked.
@pr4deepr A script for importing multi-class predictions (TIFFs) from FP have now been implemented:
https://github.com/andreped/NoCodeSeg/blob/main/source/importPyramidalTIFF_multiclass.groovy#L26
I don't have a model with more than three classes (including background - hence, only two classes of interest), but it worked well in my setup, both for single WSI and for RunForProject.
That means that if you just wish to annotate data in QuPath, export these as annotated tiles, train a model on MIB, run inference on trained model in FP, and import the final results in QuPath, this is now fully possible for multi-class segmentation.
I haven't had the time to adopt the modifications from this script yet, to the main script, but perhaps that is an easy talk you could do, @SahPet, when you are testing multi-class import/export of tiles?
Anyways, as the main topic of this issue has been solved, I am closing this issue for now.
Thank you for the contribution and helping me improve the pipeline :]