iOPENCap/awesome-unimodal-training
text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)
text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)