ai_voicr_bot: A TypeScript repository from Titli007

This repository contains a voice-based AI assistant that uses speech recognition for input and text-to-speech for responses. The application consists of a React frontend for user interaction and a backend that leverages various AI services for processing and responding to queries.

App demo video link

https://drive.google.com/file/d/1Br5arurH0bYXXIDY2zXVXFr-4myQ1d-5/view?usp=drivesdk

Frontend Setup

Overview

The frontend is built with React and TypeScript, featuring a voice-based interface that:

Captures user speech via the microphone
Displays the transcribed text
Sends the text to the backend for processing
Receives and speaks the AI response

Voice Recognition Implementation

The application uses the Web Speech API, specifically:

SpeechRecognition API (webkitSpeechRecognition):

// Creates a speech recognition instance
const recognition = new (window as any).webkitSpeechRecognition();

// Configuration
recognition.lang = "en-US";
recognition.continuous = true;
recognition.interimResults = true;

Speech Synthesis API (window.speechSynthesis):

// Access the speech synthesis API
const synth = window.speechSynthesis;

// Create an utterance and configure it
const utterance = new SpeechSynthesisUtterance(text);
utterance.lang = "en-US";
utterance.rate = 1;
utterance.pitch = 1;

// Speak the text
synth.speak(utterance);

Key Window Functions Used

webkitSpeechRecognition: Browser API for speech recognition
recognition.start(): Begins listening for speech
recognition.stop(): Stops listening for speech
recognition.onstart: Event handler when recording begins
recognition.onresult: Event handler when speech is recognized
recognition.onerror: Event handler for recognition errors
recognition.onend: Event handler when recording ends
window.speechSynthesis: Browser API for text-to-speech
SpeechSynthesisUtterance: Creates a speech synthesis request
synth.speak(): Speaks the provided text
synth.cancel(): Stops any ongoing speech

Backend Setup

Environment Variables

The backend uses several API keys and environment variables:

GEMINI_API_KEY=
COHERE_API_KEY=

PINECONE_API_KEY=
PINECONE_INDEX=
PINECONE_ENVIRONMENT=
PINECONE_HOST=

Data Processing Pipeline

Data Source: JSON format data
Chunking: The JSON data is split into smaller chunks for efficient processing
Vector Database: Chunks are stored in Pinecone (vector database)
Semantic Search: When a query is received, the system performs semantic search in Pinecone
LLM Response: Gemini AI model generates responses based on the top semantic search results

Running the Backend

The backend is built with TypeScript and Node.js. To run it:

Install dependencies:

npm install

For development:

npm run dev

This uses tsx to run the TypeScript code directly.

For production:

npm run build
npm run start

This compiles the TypeScript code to JavaScript and then runs it.

How It All Works Together

User speaks into the microphone on the frontend
Speech is converted to text using the Web Speech API
Text is sent to the backend API
Backend performs semantic search in Pinecone using the query
Top search results are sent to Gemini AI to generate a response
Response is sent back to the frontend
Frontend converts the text response to speech using the Speech Synthesis API

Setup Instructions

Frontend Setup

Clone the repository
Navigate to the frontend directory
Install dependencies:

npm install

Start the development server:

npm start

Backend Setup

Navigate to the backend directory
Create a .env file with the environment variables listed above
Install dependencies:

npm install

Run the development server:

npm run dev

Browser Compatibility

The Web Speech API is not supported in all browsers. For best results, use:

Chrome
Edge
Safari (partial support)

Firefox and some mobile browsers may have limited or no support for the speech recognition features.

Titli007/ai_voicr_bot

App demo video link

Frontend Setup

Overview

Voice Recognition Implementation

Key Window Functions Used

Backend Setup

Environment Variables

Data Processing Pipeline

Running the Backend

How It All Works Together

Setup Instructions

Frontend Setup

Backend Setup

Browser Compatibility