baaivision/EVA

[EVA-CLIP] Are you sure about your way to compute probabilities?

Closed this issue · 1 comments

Hi,

Thank you so much for these very cool models. In your docs, you compute the probability by: text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1).

I am wondering about the 100.0 * multiplier. In the original OpenAI model this is because 100 = model.logit_scale.exp() (see forward-pass of CLIP). However, for your models I get very different measurements.

  • If I load via open_clip: EVA01-g-14 laion400m_s11b_b41k I get model.logit_scale.exp() == 56.9953. So the multiplier should be 56.9953, no?
  • via huggingface: BAAI/EVA-CLIP-8B-448 I get model.logit_scale.exp() == inf, because model.scale == 100. Is this just incorrectly hardcoded?

Obviously, this only changes confidence - not accuracy but could you kindly explain what the correct multiplier is for your models?

Hi @paulgavrikov, apologies for the delayed response. We fixed the logit_scale (logit_scale.exp() is consistently set to 100) during the training of EVA-CLIP-8B and EVA-CLIP-18B models. Please ignore the logit_scale in models on Hugging Face as it was affected by a mistake in the weights transformation process, but it didn't affect the usage of EVA-CLIP models.