This Voice Assistant project utilizes Google Cloud Speech-to-Text and Google Cloud Text-to-Speech APIs along with OpenAI's GPT-4 based API to create a voice-controlled assistant that listens to the user's voice commands and provides responses in spoken language.
- Clone the repository:
git clone https://github.com/aaronmansfield5/Speech-To-Text-AI.git
- Install the dependencies:
cd voice-assistant
npm install node-record-lpcm16 @google-cloud/speech @google-cloud/text-to-speech openai shelljs
-
Add your Google Cloud Project's
projectId
andkeyFilename
toapp.js
andmanageAudio.js
files. -
Add your OpenAI API key to the
configuration
object inapp.js
. -
Install VLC Media Player.
-
Install Chocolatey
-
Install SoX within an elevated Command Prompt or Powershell.
choco install sox.portable
- Start the application:
node app.js
- Speak a command prefixed with the listener's name, for example:
alexa what is the weather like today?
The Voice Assistant will process the command and provide a spoken response.
This is the main script that handles voice recognition, command processing, and calling the OpenAI API for a response. It listens to the user's voice input, transcribes it using Google's Speech-to-Text API, and checks if the transcription starts with the listener's name. If it does, it sends the command to the OpenAI API to get a response and passes it to the manageAudio.js
module.
This script handles the text-to-speech conversion and audio playback. It uses Google's Text-to-Speech API to convert the OpenAI API response into an audio file (output.wav
). It then plays the audio file using VLC media player.
Please feel free to submit issues and pull requests for improvements and bug fixes.