/LAVISH

Vision Transformers are Parameter-Efficient Audio-Visual Learners

Primary LanguagePython

Issues