med-air/Endo-FM

Issues about attention type.

16rq opened this issue · 1 comments

16rq commented

Thank you for your work. It is great!

I have some questions when I try to run your code. That is, what is the difference between 'only_space' and 'time_space_joint' attention? They are the same as each other in the code.

Thanks for your interests! 'space_only' only performs spatial attention, while 'time_space_joint' performs both spatial and temporal attentions, as you can see in the following:

if self.attention_type != 'space_only':
self.time_embed = nn.Parameter(torch.zeros(1, num_frames, embed_dim))
self.time_drop = nn.Dropout(p=drop_rate)

# Time Embeddings
if self.attention_type != 'space_only':
cls_tokens = x[:B, 0, :].unsqueeze(1)
x = x[:, 1:]
x = rearrange(x, '(b t) n m -> (b n) t m', b=B, t=T)
# Resizing time embeddings in case they don't match
if T != self.time_embed.size(1):
time_embed = self.time_embed.transpose(1, 2)
new_time_embed = F.interpolate(time_embed, size=(T), mode='nearest')
new_time_embed = new_time_embed.transpose(1, 2)
x = x + new_time_embed
else:
x = x + self.time_embed
x = self.time_drop(x)
x = rearrange(x, '(b n) t m -> b (n t) m', b=B, t=T)
x = torch.cat((cls_tokens, x), dim=1)