Extract segments from downloaded YouTube videos based on their captions.
Created after I noticed that CGP Estate Agents used phrases such as "a large size room" or "a good size" comically often during their tours. After manually editing a supercut of some occurrences, I endeavored to (somewhat) automate the process.
Before starting, update the keywords at the top of cutter.py
.
- Download videos using whichever method you prefer. I use yt-dlp since it has proven to be faster than youtube-dl.
- Scrape subtitles using
download_subtitles.sh
. Make sure to modify this file if your video files are named differently. - Run
every_vid_in_folder.sh
in the folder containing the videos. Make sure to modify the output folder if you'd like. Note that it assumes the subtitle files are named video.mp4.json.
- Run
cutter.py
with the arguments[video file]
,[subtitle file]
,[clips output directory]
The subtitles must be in the JSON format as obtained by youtube_transcript_api
- Automate video and subtitle downloading; feed the program a list of video IDs and have it work
- Improve handling of arguments and configuration