/simple-yet-powerful-srt-subtitle-parser-cpp

A single header simple, powerful and full blown srt subtitle parser written in C++.

Primary LanguageC++OtherNOASSERTION

srtparser.h : Simple, yet powerful C++ SRT Subtitle Parser Library.

srtparser.h is a single header, simple and powerful C++ srt subtitle parsing library that allows you to easily handle, process and manipulate srt subtitle files in your project. It is an extension of Oleksii Maryshchenko’s simple subtitle-parser. It has following features :

  1. It is a single header C++ (CPP) file, and can be easily used in your project.

  2. Focus on portability, efficiency and simplicity with no external dependency.

  3. Wide variety of functions at programmers disposal to parse srt file as per need.

  4. Capable of :

    • extracting and stripping HTML and other styling tags from subtitle text.

    • extracting and stripping speaker names.

    • extracting and stripping non dialogue texts.

  5. Easy to extend and add new functionalities.

How to use srtparser.h

General usage

srptparser.h is a cross-platform robust srt subtitle parser.

SubtitleParserFactory *subParserFactory = new SubtitleParserFactory("inputFile.srt");
SubtitleParser *parser = subParserFactory->getParser();

//to get subtitles

std::vector<SubtitleItem*> sub = parser->getSubtitles();
  • Call appropriate functions to perform parsing.

See demo usage in examples directory.

Parser Functions

The following is a complete list of available parser functions.

Syntax:

Class Return Type Function Description

SubtitleParserFactory

SubtitleParserFactory

SubtitleParserFactory("inputFile.srt")

Creates a SubtitleParserFactory object. Here the inputFile.srt is the path of subtitle file to be parsed. This object is used to create parser.

E.g.: SubtitleParserFactory *subParserFactory = new SubtitleParserFactory("inputFile.srt");

SubtitleParserFactory

SubtitleParser

getParser()

Returns the SubtitleParser object. This object will be used to parse the subtitle file.

E.g.: SubtitleParser *parser = subParserFactory→getParser();

SubtitleParser

std::vector<SubtitleItem*>

getSubtitles()

Returns the Subtitle as SubtitleItem object.

E.g.: std::vector<SubtitleItem*> sub = parser→getSubtitles();

SubtitleParser

std::string

getFileData()

Returns the complete file data read as it is from inputFile.srt

E.g.: std::string fileData = parser→getFileData();

SubtitleItem

long int

getStartTime()

Returns the starting time of subtitle in milliseconds.

E.g.: long int startTime = sub→getStartTime();

SubtitleItem

long int

getEndTime()

Returns the ending time of subtitle in milliseconds.

E.g.: long int endTime = sub→getEndTime();

SubtitleItem

std::string

getStartTimeString()

Returns the starting time of subtitle in srt format.

E.g.: std::string startTime = sub→getStartTimeString();

SubtitleItem

std::string

getEndTimeString()

Returns the ending time of subtitle in srt format.

E.g.: std::string endTime = sub→getEndTimeString();

SubtitleItem

std::string

getText()

Returns the subtitle text as present in .srt file.

E.g.: std::string text = sub→getText();

SubtitleItem

std::string

getDialogue(bool keepHTML, bool doNotIgnoreNonDialogues, bool doNotRemoveSpeakerNames);

Returns the subtitle text after processing according to parameters.

keepHTML = 1 to stop parser from stripping style tags

doNotIgnoreNonDialogues = 1 to stop parser from ignoring and extracting non dialogue texts such as (laughter).

doNotRemoveSpeakerNames = 1 to stop parser from ignoring and extracting speaker names

By default (0,0,0) values are passed.

E.g.: std::string text = sub→getDialogue();

SubtitleItem

int

getWordCount()

Returns the count of number of words present in the subtitle dialogue.

E.g.: int wordCount = sub→getWordCount();

SubtitleItem

std::vector<std::string>

getIndividualWords()

Returns string vector of individual words present in subtitle.

E.g.: std::vector<std::string> words = sub→getIndividualWords();

SubtitleItem

bool

getIgnoreStatus()

Returns the ignore status. Returns true, if the justDialogue field i.e. subtitle after processing is empty.

_E.g.: bool ignore = sub→getIgnoreStatus();

SubtitleItem

int

getSpeakerCount()

Returns the count of number of speakers present in the subtitle.

E.g.: int speakerCount = sub→getSpeakerCount();

SubtitleItem

std::vector<std::string>

getSpeakerNames()

Returns string vector of speaker names.

E.g.: std::vector<std::string> speakerNames = sub→getSpeakerNames();

SubtitleItem

int

getNonDialogueCount()

Returns the count of number of non dialogue words present in the subtitle.

E.g.: int nonDialogueCount = sub→getNonDialogueCount();

SubtitleItem

std::vector<std::string>

getNonDialogueWords()

Returns string vector of non dialogue words.

E.g.: std::vector<std::string> nonDialogueWords = sub→getNonDialogueWords();

SubtitleItem

int

getStyleTagCount()

Returns the count of number of style tags present in the subtitle.

E.g.: int styleTagCount = sub→getStyleTagCount();

SubtitleItem

std::vector<std::string>

getStyleTags()

Returns string vector of style tags.

E.g.: std::vector<std::string> styleTags = sub→getStyleTags();

SubtitleWord

std::string

getText()

Returns the subtitle text as present in .srt file.

E.g.: std::string text = sub→getText();

Examples

While I’ve tried to include examples in the above table, a compilation of all of them together in a single C++ program can be found in example directory.

Contributing

Suggestions, features request, PRs, bug reports, bug fixes are welcomed. I’ll be thankful.

Credits

Built upon a MIT licensed simple subtitle-parser called LibSub-Parser by Oleksii Maryshchenko.

The original parser had 3 major functions : getStartTime(), getEndTime() and getText().

Rest work done by Saurabh Shrivastava, originally for using this in his GSoC project.