tsb0601/MMVP

Why is CLIP model created during loading od DINO Encoder?

Opened this issue · 0 comments

self.clip_vision_tower = CLIPVisionModel.from_pretrained(self.vision_tower_name)