Speech-To-Text AI

This Voice Assistant project utilizes Google Cloud Speech-to-Text and Google Cloud Text-to-Speech APIs along with OpenAI's GPT-4 based API to create a voice-controlled assistant that listens to the user's voice commands and provides responses in spoken language.

Prerequisites

Installation

Clone the repository:

git clone https://github.com/aaronmansfield5/Speech-To-Text-AI.git

Install the dependencies:

cd voice-assistant
npm install node-record-lpcm16 @google-cloud/speech @google-cloud/text-to-speech openai shelljs

Add your Google Cloud Project's projectId and keyFilename to app.js and manageAudio.js files.
Add your OpenAI API key to the configuration object in app.js.
Install VLC Media Player.
Install Chocolatey
Install SoX within an elevated Command Prompt or Powershell.

choco install sox.portable

Usage

Start the application:

node app.js

Speak a command prefixed with the listener's name, for example:

alexa what is the weather like today?

The Voice Assistant will process the command and provide a spoken response.

Modules

app.js

This is the main script that handles voice recognition, command processing, and calling the OpenAI API for a response. It listens to the user's voice input, transcribes it using Google's Speech-to-Text API, and checks if the transcription starts with the listener's name. If it does, it sends the command to the OpenAI API to get a response and passes it to the manageAudio.js module.

manageAudio.js

This script handles the text-to-speech conversion and audio playback. It uses Google's Text-to-Speech API to convert the OpenAI API response into an audio file (output.wav). It then plays the audio file using VLC media player.

Contributing

Please feel free to submit issues and pull requests for improvements and bug fixes.