/extract-subtitles

Extract Subtitles From Video 视频字幕提取 帧间差分法识别关键帧 OCR识别

Primary LanguagePythonMIT LicenseMIT

Subtitles Extraction

Extract key frames from Amanpreet Walia.

This project is used to extract subtitles from the video. First, the key frames is extracted from the video, and then the subtitle area of the frame picture is cropped, and the text is recognized by the OCR.

Getting Started

Install following dependences

  • OpenCV-Python (used for basic video processing e.g. read-frame-stream, crop, frame-diff, processing-gui)
  • PyTesseract (only use its image_to_string(img, lang))
  • NumPy (smooth filter) (find it here)
  • SciPy (signal.argrelextrema)
  • StrsimPy (NormalizedLevenshtein string similiarity)
  • Matplotlib (draw frame differences stem plot)
  • ProgressBar

Install missing dependences first using pip install -r requirements.txt

Install Tesseract OCR

Download and (try) run it, select language support in tesseract --list-lang if you want.

Run

λ python extract_subtitles.py <videopath>

License

This project is licensed under the MIT License - see LICENSE for details