eps696/aphantasia

New CLIP Models

torridgristle opened this issue · 3 comments

OpenAI put out 2 new clip models, RN50x4 and RN101. I've tried them and I genuinely don't know what visible difference there is, but maybe you'll want to add them as options.

sure, done

the difference is pretty huge in fact. ViT is my favourite yet (for consistency and predictability); RN101 produces insane visual festival on unclear topics every now and then - but it also manages to keep the macro composition intact (which ViT misses big time), so it's worth to fight. RN50x4 takes too much memory and on the first look was less impressive than the two.

Interesting, it might just be my specific prompts, but I find RN50x4 to produce much better results than RN101. Or at least, RN50x4 is more "accurate" when my prompt includes a named person.

Using the same prompt, for RN50x4 I get a pretty good representation of that person's face/body, but with RN101, I get a collage of creepy reptiles, fluffy dogs and smiling blonde haired women. (whereas the person in the prompt is a bald man)

thanks for posting you findings. it definitely seems that various models have learnt different things, so their "usability" heavily depends on exact application. i never tested CLIP on specific personalities, or even kinds of people in general; whilst more generic prompts/titles usually resulted in a fuzzy messy stuff with RN50x4 (as if it "knew too much" and could not really decide what to show).
anyway, this variability is the reason to offer a selection of the models, so that you can decide what is better in your case.
the mileage certainly may (and does) vary ::]