mahmoodlab/CLAM

ViT?

Opened this issue · 0 comments

After performing feature extraction, can we use a vision transformer to process those features? By asking this, I'm specifically referring to whether it's possible to apply position embedding.