different weight decay between code and paper for clip 18b
Closed this issue · 1 comments
Novestars commented
In the paper, wd was 0, while in the code base wd is set to default value which is 0.02
Quan-Sun commented
@Novestars Appreciate you bringing this to our attention. The wd is set to 0 during the training of both EVA-CLIP-18B and EVA-CLIP-8B models.