ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer PyTorch Implementation of ViT-TTS (EMNLP'23) The code will come soon.