SSMT: Speech-to-Speech Machine Translation System

Current Deployment

Set the port number on which to run the backend in uvicorn_worker.py file.
Set the number of workers in uvicorn_worker.py file. (Number of workers is how many instances of the SSMT pipelines to load)
Run the uvicorn_worker.py file with command python3 uvicorn_worker.py

The SSMT pipeline consists of 3 models, Automatic Speech Recognition (ASR), Machine Translation (MT) and Text-to-Speech (TTS) models.
The input speech is passed to the ASR model which transribes the speech and generated the text in source language.
The source language text is passed through the MT model which translated the source langauge text to target language text.
The target language text is passed to the TTS model which generates the speech in target language.

The code is written in such a way that the multiple SSMT pipelines on a single GPU and also across multiple GPUs.
The free memory on a GPU is first checked and if sufficient memory is available on a GPU then the models are loaded on that GPU.
If sufficient free space is not available on a GPU then the next GPU on the machine is checked.
Example: Consider a DGX A100 machine which consists of 8 Nvidia A100 GPUs and the SSMT pipeline occupies a space of 6GB. Then on a single GPU 13 SSMT pipelines can be run. So, across 8 GPUs a total of 13*8=104 SSMT pipelines can be run.
The code is written is such a way that it can dynamically load models on multi GPUs machines to utilize the entire GPU memory.