Extract key frames from Amanpreet Walia.
This project is used to extract subtitles from the video. First, the key frames is extracted from the video, and then the subtitle area of the frame picture is cropped, and the text is recognized by the OCR.
- OpenCV-Python (used for basic video processing e.g. read-frame-stream, crop, frame-diff, processing-gui)
- PyTesseract (only use its
image_to_string(img, lang)
) - NumPy (
smooth
filter) (find it here) - SciPy (
signal.argrelextrema
) - StrsimPy (
NormalizedLevenshtein
string similiarity) - Matplotlib (draw frame differences stem plot)
- ProgressBar
Install missing dependences first using pip install -r requirements.txt
Download and (try) run it, select language support in tesseract --list-lang
if you want.
λ python extract_subtitles.py <videopath>
This project is licensed under the MIT License - see LICENSE for details