the reasoning behind attention heatmaps code
clemsgrs opened this issue · 1 comments
Hi, I thoroughly went through the attention heatmap generation code and there is one thing I have trouble understanding.
I'd love to hear your take on this as it would allow me to get the part of the picture I'm missing.
To keep it simple, let's focus on the create_patch_heatmaps_indiv
function.
HIPT/HIPT_4K/attention_visualization_utils.py
Line 311 in b5f4844
In the line above, you're taking the (240, 240) bottom crop of the input patch, then paste it in the top-left corner of a white (256, 256) image. Then, you retrieve the attention scores for the original input patch, as well as for patch2
.
Eventually, you combine both attention scores in the following lines:
HIPT/HIPT_4K/attention_visualization_utils.py
Lines 342 to 346 in b5f4844
Here all you do is restricting the attention scores from scores256_2
the those corresponding to the tissue crop in patch2
.
Then, you sum scores256_1
and new_scores256_2
, making sure to divide by a twice bigger weight (200) the portion where scores256_1
and scores256_2
overlap (because they represent the same tissue crop).
I draw a summary of what is happening:
My question then boils down to: what is the reasoning behind blending a crop and not simply computing scores256
via:
_, a256 = get_patch_attention_scores(patch, model256, device256=device256)
score256 = get_scores256(a256[:,i,:,:], size=(s,)*2)
score256 = score256 / 100
Thanks!
Hi @clemsgrs
Great visualization. The main reasoning for blending was to develop smooth attention maps similar to that of the heatmap generation code in CLAM, which similarly performs block blending. As you have described and visualized, a small offset is added such that the image is shifted by 16 pixels, with the scores averaged in a way s.t. the padded portion of image do not contribute toward the heatmap. Ultimately, you get much smoother heatmap that may aid in not only qualitative analysis, but also potential post-hoc zero-shot segmentation and detection applications. Since both ViTs have patch sizes of 16 (instead of 8), this patch overlap strategy may help with the latter application.