/chiblee-transcript-searcher

This is a transcript searcher for Chiblee's Youtube VOD channel (https://www.youtube.com/@ChibleeVODs).

Primary LanguageJava

The live-website of this application is here: WIP

This is a transcript searcher that searches the YouTube channel of Chiblee (https://www.youtube.com/@ChibleeVODs). This application finds every occurence of when a certain phrase was spoken, or shows an interactable list of words he said in a particular video.

How it works (simplified)

  1. All of the transcripts are downloaded using yt-dlp, and stored as txt files.
  2. The txt files are then parsed and separated into separate lines, with it's timestamp and it's text.
  3. This parsing is used for creating 2 tables in a database: an Inverted index and one called 'Transcript'.
  4. The Inverted index stores each word with a list of: ids of every video it's appeared in, along with the timestamp of when it was said in those videos. Certain techniques are used here like tokenisation, lemmatisation, and removing stop-words.
  5. The Transcript table simply stores all the text and timestamps in all videos as a single string.
  6. Then these 2 tables data are looped over and key information is returned.

The phrase finder uses the Inverted index to very quickly find the occurences of a phrase, and the Transcript is used for showing the context of the phrase (which appers below the video). The 'individual' finder, which gives an interactable transcript for a certain video, works by giving the video's unique title or id to the Transcript table - which returns all the timestamps and text.

screenshot-of-search