Closed this issue 2 years ago · 1 comments
In any pretrained vit setting, there is no cnn-based encoder here. That's to say, pretrained ViT acts as an encoder only. Am i right?
Sorry, i see. It is wrapped inside class Embeddings.