/bloice

Primary LanguagePythonMIT LicenseMIT

Speech-to-Text and Text-to-Speech Conversion with OpenAI

This repository contains Python code that utilizes OpenAI's GPT-3 and Whisper models to perform Speech-to-Text (STT) and Text-to-Speech (TTS) conversions. The code records audio, transcribes it using OpenAI's Whisper STT model, generates a response using GPT-3, and converts the response text into speech using OpenAI's TTS model. This README provides an overview of the code and its usage.

Prerequisites

Before using this code, ensure you have the following prerequisites installed:

  1. Python 3.x
  2. OpenAI Python SDK (openai)
  3. PyAudio (pyaudio)
  4. Wave (wave)
  5. PyDub (pydub)
  6. Dotenv (dotenv)
  7. Pygame (pygame)

You should also have an OpenAI API key, which you can obtain by signing up for an account on the OpenAI platform.

Usage

1. Setup

  1. Clone this repository to your local machine.
  2. Install the required Python packages listed above using pip install -r requirements.txt.
  3. Create a .env file in the project directory with your OpenAI API key:
PROJECT_API_KEY=your_api_key_here

2. Recording and Transcription

  • The code records audio for a specified duration (5 seconds by default) using the PyAudio library.
  • The recorded audio is then saved as an MP3 file named input.mp3.
  • The openai library is used to transcribe the audio using the Whisper STT model.
  • The transcribed text is stored in the transcript variable.

3. Chat Completion

  • The transcribed text is used as a prompt for the GPT-3 model to generate a response.
  • The code sends a system message and user message to the GPT-3 model.
  • The response generated by GPT-3 is extracted and printed to the console.

4. Text-to-Speech Conversion

  • The generated response from GPT-3 is passed to the OpenAI TTS model for conversion.
  • The TTS model generates an MP3 file named blah.mp3 containing the synthesized speech.

5. Playback

  • The Pygame library is used to play the synthesized speech.
  • The code loads the blah.mp3 file and plays it through the speakers.

6. Cleanup

  • The generated audio files (input.mp3 and blah.mp3) are created in the project directory and can be used as needed.

Important Notes

  • Make sure your microphone is properly configured and connected to your computer to record audio.
  • You can customize the recording parameters such as duration and audio format in the code.

License

This code is provided under the MIT License for personal and open-source use. Please refer to the license file for more details.

Acknowledgments

This code uses the OpenAI GPT-3 and Whisper models. Make sure to review OpenAI's usage policies and pricing details on their website.

Feel free to modify and extend this code as needed and provide proper attribution to OpenAI when using their models. Enjoy experimenting with Speech-to-Text and Text-to-Speech conversion!