huggingface/open-muse

Maxvit

isamu-isozaki opened this issue · 0 comments

Add in max vit from here for high res stage. Code implementation given in timm and lucidrian.

The basic idea is to split self-attention into blocked attention(local) and grid attention(global) so as doing one

maxvit

As doing global attention is too expensive+doing just local under fits on the data