/transpose-timestamps

A node module to align Speech To Text transcript data with human accurate base transcript. By transposing the words from the accurate text onto the time-codes of the STT data via diffing algo.

Primary LanguageJavaScript

Transpose timestamps

A node module to align Speech To Text transcript data with human accurate base transcript. By transposing the words from the accurate text onto the time-codes of the STT data via diffing algo.

Revisiting the concept of stt-align-node and trying an alternative solution for the same alignment problem

To be used as part of slate-transcript-editor which is a component used as part of autoEdit3

Setup

git clone git@github.com:pietrop/transpose-timestamps.git
cd transpose-timestamps
npm install

Usage

example input - transcript
{
    "words": [
        ...
        {
            "id": 46,
            "start": 29.11,
            "end": 29.41,
            "text": "Call"
        },
        {
            "id": 47,
            "start": 29.41,
            "end": 29.63,
            "text": "me"
        },
        {
            "id": 48,
            "start": 29.63,
            "end": 30.35,
            "text": "Ishmael."
        },
        {
            "id": 49,
            "start": 30.9,
            "end": 31.21,
            "text": "Some"
        },
        {
            "id": 50,
            "start": 31.21,
            "end": 31.57,
            "text": "years"
        },
        {
            "id": 51,
            "start": 31.57,
            "end": 32.13,
            "text": "ago."
        },
        {
            "id": 52,
            "start": 32.29,
            "end": 32.66,
            "text": "Never"
        },
        {
            "id": 53,
            "start": 32.66,
            "end": 33.18,
            "text": "mind."
        },
        {
            "id": 54,
            "start": 33.18,
            "end": 33.46,
            "text": "How"
        },
        {
            "id": 55,
            "start": 33.46,
            "end": 33.91,
            "text": "long"
        },
        ...
  ]
}
example input - baseText
Call me Ishmael. Some years ago—never mind how long precisely—having ...
import transposeWords from 'transpose-timestamps';
const mobyTranscript = require('../sample/data/moby-dick-chapter-1/words.json');
const mobyText = fs.readFileSync('./sample/data/moby-dick-chapter-1/text.txt').toString();

const alignedWords = transposeWords({ baseText: mobyText, transcript: mobyTranscript });

// Do something with the aligned words
example output
{
  "words": [
    {
      "id": 46,
      "start": 29.11,
      "end": 29.41,
      "text": "Call"
    },
    {
      "id": 47,
      "start": 29.41,
      "end": 29.63,
      "text": "me"
    },
    {
      "id": 48,
      "start": 29.63,
      "end": 30.35,
      "text": "Ishmael."
    },
    {
      "id": 49,
      "start": 30.9,
      "end": 31.21,
      "text": "Some"
    },
    {
      "id": 50,
      "start": 31.21,
      "end": 31.57,
      "text": "years"
    },
    {
      "id": 51,
      "start": 31.57,
      "end": 32.13,
      "text": "ago"
    },
    {
      "id": 52,
      "start": 32.29,
      "end": 32.66,
      "text": "never"
    },
    {
      "id": 53,
      "start": 32.66,
      "end": 33.18,
      "text": "mind"
    },
    {
      "id": 54,
      "start": 33.18,
      "end": 33.46,
      "text": "how"
    },
    {
      "id": 55,
      "start": 33.46,
      "end": 33.91,
      "text": "long"
    },
        ...
  ]
}

See example usage for more.

Note: Because of constraints with the alignment process the result will split dashed words - into (two) separate words

System Architecture

Uses word-diff lib for diffing.

Node version of stt-align by Chris Baume - R&D.

In pseudo code overview of alignSTT:

  • input, output as described in the example usage.
    • Accurate base text transcription, string.
    • Array of word objects transcription from STT service.

Transpose timestamps / alignment

  1. convert stt words to text string
    1. normalize stt words text string
    2. normalize base text string
  2. Diffing via word-diff lib.And iterate over results of diffing
    1. Replaced. Is when number of deleted and inserted is equal. And can transpose the timecodes onto the inserted once.
    2. Deleted / remove words. Words present in STT but not in base text
    3. Inserted / add words. Words not recognised by STT but present in base text
    4. Equal / text words. Words recognized correctly by STT. Only need to transpose timecodes, to retain punctuation, capitalization etc..
  3. Compute times for inserted words
    1. compute word timings
      1. Using start and end time of section to calculate weighted average start and time for each word in the section.
    2. Edge case if missing end time for last word.
      1. Calculate/estimate the end time from the start time of section by using calculateWordDuration (heuristic function that estimated word duration based on number of carchters) to add up word durations up to last one.

Documentation

There's a docs folder in this repository.

docs/notes contains dev draft notes on various aspects of the project. This would generally be converted either into ADRs or guides when ready.

docs/adr contains Architecture Decision Record.

An architectural decision record (ADR) is a document that captures an important architectural decision made along with its context and consequences.

We are using this template for ADR

Development env

Node version is set in node version manager .nvmrc

nvm use

See visually inspect results guide for a way to very the output.

Build

NA

Tests

npm test

With jest

Deployment