taoyang1122/adapt-image-models

why the second dimension is n*b?

Closed this issue · 4 comments

xt = rearrange(x, 'n (b t) d -> t (b n) d', t=self.num_frames)

Hi, it is combining the spatial dimension with the batchsize dimension, and do self-attention on the temporal dimension in the following self-attention layer.

why not (b n)* t*d?

because the self-attention is applie to the first dimension.

Thank you for your reply! It turns out I was careless in looking up the API definition.