We add two linear layers on the image encoder and text encoder of CLIP and use the KL-Divergence as the loss function to fine-tune the CLIP.
The code is based on https://github.com/openai/CLIP
We add two linear layers on the image encoder and text encoder of CLIP and use the KL-Divergence as the loss function to fine-tune the CLIP.
The code is based on https://github.com/openai/CLIP