/TMT

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Primary LanguageJupyter NotebookMIT LicenseMIT

Stargazers