average number of [4096,4096] regions per slide
clemsgrs opened this issue · 9 comments
Hi,
I'm trying to reproduce your results for the TCGA BRCA dataset. I've thus narrowed down the dataset to the 875 breast slides you use to train & evaluate your models (based on the .csv files you provided here).
As a first step, I ran CLAM segmentation & patching pipeline on this dataset. I used the preset parameters that the group provided for TCGA BRCA (can be found here). When computing the average number of regions per slide, I get avg_M ~ 212
.
The paper states that the average number of [4096,4096] regions per slide (avg_M) is around 38 when computed over the 10,678 FFPE slides from 33 cancer types in TCGA. I thought "maybe TCGA BRCA slides are much bigger than the other cancer types', hence why such a big difference when computing avg_M".
To assess whether or not that assumption was true, I downloaded the pre-extracted "region-level" feature embeddings you kindly provide under 3-Self-Supervised-Eval/embeddings_slide_lib/embeddings_slide_lib/vit256mean_tcga_slide_embeddings
. From these I can easily tell how many [4096,4096] regions you found for each of the 875 TCGA BRCA slides. After computing the average over these slides, I get avg_M ~ 30
.
I was thus wondering if you had an idea of where such a big difference (212 vs. 30) may come from. Do you remember using tcga preset parameters when running CLAM or did you use custom parameters?
Thank you!
Hi @clemsgrs
Thank you for your interest in our work. To check, what "level" are you patching at? Note that some slides @ level 0 start at 40X, while others start at 20X. 212 4096 x 4096
regions seems a bit large (means there is approximately 54,272 256 x 256
patches). The number is more in-line with my estimate on the average # of patches at 40X (~60K), versus average # of patches at 20X (~15K).
An additional quality control was performed to remove regions that were predominantly background spaces (white space). Will get back to you on what parameters I have used.
Oh good call, I wrongly assumed all slides from TCGA came with the same magnification at level 0.
After checking, the vast majority of them seem to be 40x at level 0, only a few being 20x. Given that I was patching at level 0, it makes sense I had approx. 4 times more patches. I've switched to patching at 20x.
With CLAM preset parameters for TCGA, I now get avg_M ~ 60
, which is closer your average, though still double the amount. Difference could come from the post-processing you mentioned + different set of parameters.
Hi @clemsgrs, May I know to patch at 20x instead of 40x? Is there a value I should change in patch_level?
Hi, I've had to write some code in order to patch at 20x instead of 40x.
Also, reasoning with magnification level is not as precise as reasoning with pixel spacing (or resolution).
As said above, most TCGA BRCA slides are 40x at level 0: I inferred this from pixel spacing at level 0 (which is around 0.25 µm/pixel, and most scanners have such a spacing at 40x).
To switch to 20x, I've looked at which level the pixel spacing was closest to 0.50 µm/pixel (twice bigger pixel spacing is equivalent to twice smaller magnification).
My slight adaptation of CLAM pre-processing code should be available here.
You can take a look at the get_best_level_for_spacing
function here.
Just like I did, you only need to call it once at the beginning of the process_contour
method (see here)
Hi @clemsgrs, thanks for the quick response. How do you get a pixel spacing value like 0.25 µm/pixel at level 0?
What about the camelyon16 dataset? is it also 40x at level 0?
Look at the get_best_level_for_spacing
function to find out how I get the pixel spacing at level 0:
# OpenSlide gives the resolution in centimeters so we convert this to microns
x_res = float(self.wsi.properties["tiff.XResolution"]) / 10000
y_res = float(self.wsi.properties["tiff.YResolution"]) / 10000
x_spacing, y_spacing = 1 / x_res, 1 / y_res
Hi @clemsgrs, I managed to extract 4k patches with 20x (I think) with the slight adaption of CLAM with your code. However, they all look the same as only one axis is moving (in my case, only x). Are they normal?
Hi, have you made sure that the step_size
parameter in the config file is equal to the patch_size
parameter?
Closing this issue for now, but please feel free to re-open it, if there are more questions!