/awesome-unimodal-training

text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)

Stargazers