ViT?
Opened this issue · 0 comments
sjjadsa commented
After performing feature extraction, can we use a vision transformer to process those features? By asking this, I'm specifically referring to whether it's possible to apply position embedding.
Opened this issue · 0 comments
After performing feature extraction, can we use a vision transformer to process those features? By asking this, I'm specifically referring to whether it's possible to apply position embedding.