The live-website of this application is here: WIP
This is a transcript searcher that searches the YouTube channel of Chiblee (https://www.youtube.com/@ChibleeVODs). This application finds every occurence of when a certain phrase was spoken, or shows an interactable list of words he said in a particular video.
- All of the transcripts are downloaded using yt-dlp, and stored as txt files.
- The txt files are then parsed and separated into separate lines, with it's timestamp and it's text.
- This parsing is used for creating 2 tables in a database: an Inverted index and one called 'Transcript'.
- The Inverted index stores each word with a list of: ids of every video it's appeared in, along with the timestamp of when it was said in those videos. Certain techniques are used here like tokenisation, lemmatisation, and removing stop-words.
- The Transcript table simply stores all the text and timestamps in all videos as a single string.
- Then these 2 tables data are looped over and key information is returned.
The phrase finder uses the Inverted index to very quickly find the occurences of a phrase, and the Transcript is used for showing the context of the phrase (which appers below the video). The 'individual' finder, which gives an interactable transcript for a certain video, works by giving the video's unique title or id to the Transcript table - which returns all the timestamps and text.