
Segments a wav file into several smaller audio clips using an accompanying .srt closed captioning file.

Primary LanguagePython


Segments a wav file into several smaller audio clips using an accompanying .srt closed captioning file.


usage: srt-parse [-h] [--output-dir OUTPUT_DIR]
             [--audio-out-file-pattern AUDIO_OUT_FILE_PATTERN]
             [--text-out-file-pattern TEXT_OUT_FILE_PATTERN]
             [--output-type {txt,csv}] [--csv-seperator CSV_SEPERATOR]
             [--csv-filename CSV_FILENAME]
             [--update-increment UPDATE_INCREMENT]
             [--in-encoding IN_ENCODING] [--out-encoding OUT_ENCODING]
             audio_input srt_input

Segment wav files according to a provided .srt closed caption file

positional arguments:
  audio_input           Location of .wav file to be processed
  srt_input             Location of .srt file to be processed

optional arguments:
  -h, --help            show this help message and exit
  --output-dir OUTPUT_DIR
                        Directory for processed files to be saved to
  --audio-out-file-pattern AUDIO_OUT_FILE_PATTERN
                        A python-style f-string for saving audio files
  --text-out-file-pattern TEXT_OUT_FILE_PATTERN
                        A python-style f-string for saving text files
  --output-type {txt,csv}
                        Output filetype
  --csv-seperator CSV_SEPERATOR
                        Character sequence used to seperate values in csv
  --csv-filename CSV_FILENAME
                        Name of file to write as csv
  --update-increment UPDATE_INCREMENT
                        Print progress after every specified amount of
  --in-encoding IN_ENCODING
                        Encoding used to read the .srt file
  --out-encoding OUT_ENCODING
                        Encoding to use when writing text data to file


Using srt-parse:

python3 srt-parse.py foo.wav foo.srt

Will produce in the following files in the output directory (by default .\out\)


Each file is made per subtitle in the .srt file and out.csv groups each audio file to its transcript.