Multilingual Video Localization Software for Dubbing Using AI

Automatic dubbing of videos from English to Indian regional languages for educational purposes.

About

Globalization of media and the expansive reach of the internet Spurred a significant demand for video content that can transcend linguistic boundaries. Traditional localization methods like subtitling and dubbing face challenges in cost, time, and quality Demands the necessity of the development of the Multilingual Video Localization Software (MVLS)

Features

  1. Enable multilingual dubbing
  2. Ensure quality and scalability
  3. Have a User-friendly Interface

Requirements

Hardware: Microphone: Required to capture audio from various sources. Computer or Server: With sufficient processing power and memory to run the audio processing and translation algorithms effectively. GPU (Graphics Processing Unit): For accelerating deep learning algorithms used in speech recognition and translation. Storage: Sufficient storage space is needed to store audio files, intermediate data, and models. Network Connectivity: Reliable internet connectivity is essential for accessing cloud-based services and APIs, such as the Google Cloud Speech API and Google Cloud Translation API. Software: Python: The primary programming language for implementing the audio processing and translation algorithms. Libraries and Frameworks: Pydub: Python library for audio processing tasks such as extracting audio from video files, merging audio files, and exporting audio files. MoviePy: Python library for video editing tasks, including audio extraction and replacement. Google Cloud SDK: Software development kit for accessing Google Cloud services, including the Speech-to-Text API and Translation API. Spacy: Python library for natural language processing tasks such as text tokenization and part-of-speech tagging.

System Architecture

image

Output

Output 1: Translated Audio File Generation image Output 2: Dubbed Video Ouput Received image

Results & Impact

Multilingual Video Localization Software for Dubbing technology is used to redefine multimedia localization, dissolving language barriers and offering limitless possibilities for creators, educators, and businesses on a global scale By pioneering automated video dubbing, leveraging Deepgram API for speech recognition, TTS-generated voice overs, and prosodic alignment for superior quality.

References

J. Li et al., "Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 517-528, 2024, doi: 10.1109/TASLP.2023.3331813. D. Bigioi, H. Jordan, R. Jain, R. McDonnell and P. Corcoran, "Pose-Aware Speech Driven Facial Landmark Animation Pipeline for Automated Dubbing," in IEEE Access, vol. 10, pp. 133357-133369, 2022, doi: 10.1109/ACCESS.2022.3231137. Z. Huijuan, Y. Ning and W. Ruchuan, "Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation," in Chinese Journal of Electronics, vol. 32, no. 3, pp. 640-646, May 2023, doi: 10.23919/cje.2021.00.196. C. Lu, Y. Zong, W. Zheng, Y. Li, C. Tang and B. W. Schuller, "Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2217-2230, 2022, doi: 10.1109/TASLP.2022.3178232. S. Li, P. Song and W. Zheng, "Multi-Source Discriminant Subspace Alignment for Cross-Domain Speech Emotion Recognition," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2448-2460, 2023, doi: 10.1109/TASLP.2023.3288415.