A node module to align Speech To Text transcript data with human accurate base transcript. By transposing the words from the accurate text onto the time-codes of the STT data via diffing algo.
Revisiting the concept of stt-align-node and trying an alternative solution for the same alignment problem
To be used as part of slate-transcript-editor which is a component used as part of autoEdit3
git clone git@github.com:pietrop/transpose-timestamps.git
cd transpose-timestamps
npm install
example input - transcript
{
"words": [
...
{
"id": 46,
"start": 29.11,
"end": 29.41,
"text": "Call"
},
{
"id": 47,
"start": 29.41,
"end": 29.63,
"text": "me"
},
{
"id": 48,
"start": 29.63,
"end": 30.35,
"text": "Ishmael."
},
{
"id": 49,
"start": 30.9,
"end": 31.21,
"text": "Some"
},
{
"id": 50,
"start": 31.21,
"end": 31.57,
"text": "years"
},
{
"id": 51,
"start": 31.57,
"end": 32.13,
"text": "ago."
},
{
"id": 52,
"start": 32.29,
"end": 32.66,
"text": "Never"
},
{
"id": 53,
"start": 32.66,
"end": 33.18,
"text": "mind."
},
{
"id": 54,
"start": 33.18,
"end": 33.46,
"text": "How"
},
{
"id": 55,
"start": 33.46,
"end": 33.91,
"text": "long"
},
...
]
}
example input - baseText
Call me Ishmael. Some years ago—never mind how long precisely—having ...
import transposeWords from 'transpose-timestamps';
const mobyTranscript = require('../sample/data/moby-dick-chapter-1/words.json');
const mobyText = fs.readFileSync('./sample/data/moby-dick-chapter-1/text.txt').toString();
const alignedWords = transposeWords({ baseText: mobyText, transcript: mobyTranscript });
// Do something with the aligned words
example output
{
"words": [
{
"id": 46,
"start": 29.11,
"end": 29.41,
"text": "Call"
},
{
"id": 47,
"start": 29.41,
"end": 29.63,
"text": "me"
},
{
"id": 48,
"start": 29.63,
"end": 30.35,
"text": "Ishmael."
},
{
"id": 49,
"start": 30.9,
"end": 31.21,
"text": "Some"
},
{
"id": 50,
"start": 31.21,
"end": 31.57,
"text": "years"
},
{
"id": 51,
"start": 31.57,
"end": 32.13,
"text": "ago"
},
{
"id": 52,
"start": 32.29,
"end": 32.66,
"text": "never"
},
{
"id": 53,
"start": 32.66,
"end": 33.18,
"text": "mind"
},
{
"id": 54,
"start": 33.18,
"end": 33.46,
"text": "how"
},
{
"id": 55,
"start": 33.46,
"end": 33.91,
"text": "long"
},
...
]
}
See example usage for more.
Note: Because of constraints with the alignment process the result will split dashed words -
into (two) separate words
Uses word-diff lib for diffing.
Node version of stt-align by Chris Baume - R&D.
In pseudo code overview of alignSTT
:
- input, output as described in the example usage.
- Accurate base text transcription, string.
- Array of word objects transcription from STT service.
Transpose timestamps / alignment
- convert stt words to text string
- normalize stt words text string
- normalize base text string
- Diffing via word-diff lib.And iterate over results of diffing
- Replaced. Is when number of deleted and inserted is equal. And can transpose the timecodes onto the inserted once.
- Deleted /
remove
words. Words present in STT but not in base text - Inserted /
add
words. Words not recognised by STT but present in base text - Equal /
text
words. Words recognized correctly by STT. Only need to transpose timecodes, to retain punctuation, capitalization etc..
- Compute times for inserted words
- compute word timings
- Using start and end time of section to calculate weighted average start and time for each word in the section.
- Edge case if missing end time for last word.
- Calculate/estimate the end time from the start time of section by using
calculateWordDuration
(heuristic function that estimated word duration based on number of carchters) to add up word durations up to last one.
- Calculate/estimate the end time from the start time of section by using
- compute word timings
There's a docs folder in this repository.
docs/notes contains dev draft notes on various aspects of the project. This would generally be converted either into ADRs or guides when ready.
docs/adr contains Architecture Decision Record.
An architectural decision record (ADR) is a document that captures an important architectural decision made along with its context and consequences.
We are using this template for ADR
- npm >
6.1.0
- Node 12
Node version is set in node version manager .nvmrc
nvm use
See visually inspect results guide for a way to very the output.
NA
npm test
With jest