This project uses Google Cloud Speech-to-Text API to transcribe speech to text, DeepL API to translate the transcribed text, and ElevenLabs API to convert the translated text back to speech. This creates a seamless speech-to-speech translation system.
Before running this project, ensure you have the following dependencies installed:
- Python 3.7 or later
- Google Cloud SDK (gcloud)
- Pyaudio
- Requests
- Pygame
- DeepL API key
- ElevenLabs API key
-
Clone the repository:
git clone https://github.com/bykemalh/S2ST.git cd S2ST
-
Set up a virtual environment:
python3 -m venv env source env/bin/activate # On Windows use `env\Scripts\activate`
-
Install the required Python packages:
pip install google-cloud-speech pyaudio deepl requests pygame
-
Install Google Cloud SDK: Follow the installation instructions for your operating system here.
-
Authenticate with Google Cloud:
gcloud auth login gcloud auth application-default login
-
Enable the Google Cloud Speech-to-Text API:
gcloud services enable speech.googleapis.com
-
Set up API keys: Replace the placeholder values in the script with your actual DeepL and ElevenLabs API keys.
auth_key = "your-deepl-auth-key" xi_api_key = "your-elevenlabs-api-key"
To run the application, simply execute the main.py
script:
python S2ST_NewAdvanced.py
-
Audio Input:
- The application opens a microphone stream using the
pyaudio
library and captures audio in real-time.
- The application opens a microphone stream using the
-
Speech-to-Text:
- The captured audio is sent to the Google Cloud Speech-to-Text API, which returns the transcribed text.
-
Translation:
- The transcribed text is translated to English using the DeepL API.
-
Text-to-Speech:
- The translated text is sent to the ElevenLabs API, which converts it to speech and plays it back.
Ensure you have the following libraries installed:
google-cloud-speech
pyaudio
deepl
requests
pygame
You can install these dependencies using the following command:
pip install google-cloud-speech pyaudio deepl requests pygame
Modify the following variables in the script to match your settings:
auth_key
: Your DeepL API key.xi_api_key
: Your ElevenLabs API key.voice_id
: The voice ID to be used with ElevenLabs API.RATE
: The audio sample rate (default is 16000).CHUNK
: The audio chunk size (default is 1600).
Logging is set up in the script to capture errors during the text-to-speech conversion process. You can enable more detailed logging by uncommenting the logging configuration line.
# logging.basicConfig(level=logging.DEBUG)
This project is licensed under the MIT License. See the LICENSE file for details.
If you wish to contribute to this project, please fork the repository and create a pull request.
This algorithm was developed by Kemal Hafızoğlu.