In the paper, it is mentioned that visualizing the last layer of attention graph, how is this operation done?
notfacezhi opened this issue · 4 comments
Hi,
In https://github.com/facebookresearch/detr#notebooks the first notebook has the code to visualize the images that we used in the paper, including the attention matrix.
Each point in the original image (which corresponds to a line (or column) in the attention mask) can be reshaped as an image.
I believe I've answered your question and as such I'm closing this issue
hey @fmassa, thanks for the great detr work! I've been trying to replicate some of the work illustrations.
I'd expect the self-attention weights would come from the operation attn = (q*scale) @ k.T
that weighs the values. It turned out that looking at the detr repo at the Transformers classes definition: https://github.com/facebookresearch/detr/blob/main/models/transformer.py#L127, the forward outcome only yields the final tensor of dimensions (b, h * w, c)
.
I don't know how you could get the hook's outcome from the colab's notebook. Is there any other code that the colab model used?
hey @fmassa, thanks for the great detr work! I've been trying to replicate some of the work illustrations.
I'd expect the self-attention weights would come from the operation
attn = (q*scale) @ k.T
that weighs the values. It turned out that looking at the detr repo at the Transformers classes definition: https://github.com/facebookresearch/detr/blob/main/models/transformer.py#L127, the forward outcome only yields the final tensor of dimensions(b, h * w, c)
.I don't know how you could get the hook's outcome from the colab's notebook. Is there any other code that the colab model used?
Did you figure this out?