sentence segmentation in Wav2vec2
MisakaMikoto-o opened this issue · 1 comments
MisakaMikoto-o commented
how can i use ctc to segment sentence in Wav2vec2? the exsample just segment words
lumaku commented
In the Wav2vec2 example given in README.md
, the variable transcripts
is a list of two sentences:
transcripts = ["A MAN SAID TO THE UNIVERSE", "SIR I EXIST"]
This then should give you separated alignments for each sentence:
print(align_with_transcript(audio,transcripts))
# [{'text': 'A MAN SAID TO THE UNIVERSE', 'start': 0.08124999999999993, 'end': 2.034375, 'conf': 0.0},
# {'text': 'SIR I EXIST', 'start': 2.3260775862068965, 'end': 4.078771551724138, 'conf': 0.0}]
The example also gives you the word timestamps.
print(get_word_timestamps(audio))
# [{'text': 'a', 'start': 0.08124999999999993, 'end': 0.5912715517241378, 'conf': 0.9999501323699951},
# {'text': 'man', 'start': 0.5912715517241378, 'end': 0.9219827586206896, 'conf': 0.9409108982174931},
# {'text': 'said', 'start': 0.9219827586206896, 'end': 1.2326508620689656, 'conf': 0.7700278702302796},
# {'text': 'to', 'start': 1.2326508620689656, 'end': 1.3529094827586206, 'conf': 0.5094435178226225},
# {'text': 'the', 'start': 1.3529094827586206, 'end': 1.4831896551724135, 'conf': 0.4580493446392211},
# {'text': 'universe', 'start': 1.4831896551724135, 'end': 2.034375, 'conf': 0.9285054256219009},
# {'text': 'sir', 'start': 2.3260775862068965, 'end': 3.036530172413793, 'conf': 0.0},
# {'text': 'i', 'start': 3.036530172413793, 'end': 3.347198275862069, 'conf': 0.7995760873559864},
# {'text': 'exist', 'start': 3.347198275862069, 'end': 4.078771551724138, 'conf': 0.0}]
However, these timestamps do not represent alignments in the sense of start-ending of a word, but rather the most probably timestep of occurrence.
Other than that - If your transcripts
is a list of words, the example should give you word segments? If you modified the example code, I suggest to check the datatype of transcripts
.