lucidrains/vit-pytorch

Is it possible to use the "Accessing Attention" of the vit-pytorch on the timm models?

Shima-shoki opened this issue · 0 comments

Hi, thanks always for the great work of this project. Recently I dive into the transformer models and they are really fascinating. The attention system looks quite interesting. I've got to know that we can access the attention information by using vit_pytorch.recorder.Recorder, but it can't be applied on the pre-trained models available from timm. Here is an example of my attempt to use the Recorder module with a timm model and corresponding errors.

import torch
import timm
from vit_pytorch.recorder import Recorder

model = timm.create_model('vit_base_patch16_224', pretrained=True, num_classes=2)
v = Recorder(model)

img = torch.randn(1, 3, 224, 224)
preds, attns = v(img)

The error messages are:

AttributeError: 'VisionTransformer' object has no attribute 'transformer'

I understand that since they are different libraries so that it'd be expected to not work smoothly, but I'd hope that it would be quite nice if they can be combined together. Are there any ways to go around this issue?