Youtube Semantic Search

TL;DR:

Paste a youtube video
It transcribes the video with OpenAI Whisper
Reformat the video segments in chuncks of 40 seconds so there is more context in each segment
It creates embeddings with Open AI embedding endpoint
Saves the embedings in Supabase as the vector database
When searching, converts the query to an embedding, then uses Supabase postgres function to search for similarities

yt-semantic.mp4

Transcription

The transcription is done in a Python Flask app running Open AI Whisper, check it here.

The video script chunks are converted to embeddings using OpenAI embeddings api.

The embeddings are store in a Supabase database with the pgvector extension. A postgres function is used for the similarity search (more info here).

Run the python backend

> flask --app transcription_backend/server run

Run the front-end

> cd webapp
> npm run dev