/subtitle-merge

a module for synchronizing two sets of subtitles objects to output as one

Primary LanguageJavaScript

Subtitles Merge

Coverage Status Build Status Maintainability

The goals of this project are to take in two arrays of subtitle objects and pair any which match times within a given range.

This module will depend on another library and external processes to parse the subtitles, so it should be format agnostic provided you can find a means to parse the subtitles into the required format. I may include snippets in here of ways to do that from other modules at a later date.

Usage

Functions

merge(s1, s2, offset) -> Array

The default export function, this takes in two arrays of subtitle objects (s1 and s2) to pair. offset specificies the leniency (in millseconds) for determining matches.

default.isInRange(time1, time2, variance) -> Boolean

Simple comparison function.

default.isValidPair(s1, s2, offset) -> String|Boolean

Takes in two subtitle objects and checks what kind of match (if any, it is). The current range of responses are

  'FULL_MATCH',
  'START_MATCH',
  'END_MATCH',

  'START_LATE_END_EARLY',
  'START_EARLY_END_LATE',

  'START_MATCH_END_EARLY',
  'START_MATCH_END_LATE',

  'START_EARLY_END_MATCH',
  'START_LATE_END_MATCH',
  'START_AFTER_END',
  'END_BEFORE_START',

Testing

  • npm run test
  • npm run lint
  • npm run coverage (runs tests with NYC coverage checks)

Object structures

Input Format

The structure of subtitle objects attempts to follow the same format as the best subtitles parsing module I could find on npm, subtitles.js, which is as follows

{
    start: 123, // time in milleseconds
    end: 456, // same
    text: 'a string', // the displayed subtitles
}

Output Format

Output matches the input but with an added secondaryText attribute for paired subtitles, this may be changed at a later date.

{
    start: 123, // time in milleseconds
    end: 456, // same
    text: 'a string', // the displayed subtitles
    secondaryText: 'una cuerda(???)',
}

The Matching Algorithm

At the moment it's very primitive and mostly done in a "lets get something the works" mindset with the first set of subtitles being the primary set which the other needs to match with. There is a custom config objected included in the source code which will be expanded up to be the default arguments for selecting which kind of matches can be accepted

Legacy Stuff

As this is branched off from a previous project that was very SSA focused, there may be some leftover things that look weird. Let me know and I'll resolve them.

Future plans

  • Implement custom matching arguments
  • Handle overlapping subtitles (e.g. the secondary subtitles have an entry which contains text for the equivalent of several primary subtitles)

As written tests for this are rather limited and self-fulfilling in some ways, I've tested these routinely with randomly selected matching pairs of subtitles to ensure major issues are caught