[EVA-CLIP] Are you sure about your way to compute probabilities?
Closed this issue · 1 comments
Hi,
Thank you so much for these very cool models. In your docs, you compute the probability by: text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
.
I am wondering about the 100.0 *
multiplier. In the original OpenAI model this is because 100 = model.logit_scale.exp()
(see forward-pass of CLIP). However, for your models I get very different measurements.
- If I load via open_clip: EVA01-g-14 laion400m_s11b_b41k I get
model.logit_scale.exp() == 56.9953
. So the multiplier should be 56.9953, no? - via huggingface: BAAI/EVA-CLIP-8B-448 I get
model.logit_scale.exp() == inf
, becausemodel.scale == 100
. Is this just incorrectly hardcoded?
Obviously, this only changes confidence - not accuracy but could you kindly explain what the correct multiplier is for your models?
Hi @paulgavrikov, apologies for the delayed response. We fixed the logit_scale (logit_scale.exp() is consistently set to 100) during the training of EVA-CLIP-8B and EVA-CLIP-18B models. Please ignore the logit_scale in models on Hugging Face as it was affected by a mistake in the weights transformation process, but it didn't affect the usage of EVA-CLIP models.