octimot/StoryToolkitAI

Advanced Voice Recognition and Tagging System for Multi-Speaker Audio Files

Closed this issue · 6 comments

This enhancement will be particularly beneficial for transcribing meetings, interviews, gaming sessions, and podcasts involving multiple speakers, enabling users to distinguish who is speaking at any given time easily.

The Speaker Recognition/Diarization is top on the todo list - after we finish some new Assistant features.

What do you mean by "Advanced"? And how do you see the "tagging system"?

Cheers!

Mainly these:

Advanced Speaker Recognition: Utilizes high-end technology for precise identification of individual speakers in complex audio.
Tagging System: Automatically labels audio segments with speaker names for easy tracking in recordings.

We're currently testing speaker change detection, so this will probably be available in version 0.23 or 0.24:

Screenshot 2024-01-08 at 14 41 41

However, actual speaker recognition is a bit more complex, and I'm not sure that this can be done locally without many changes to the tool. It's planned though!

I've added Speaker Detection via Ingest or Transcription window with version 0.23.0

As I mentioned in the docs, the model is basic, but it does try to match speakers throughout the transcription. A more advanced implementation will come at some point!

Cheers

Amazing! Thank you! Do I need to re-download to get this feature? I didn't see a way to directly update to version 0.23.0 from Storytoolkit

Right now version 0.23.0 is available by installing from source

A standalone release is coming in a few days as an early release for Patreon members, but we'll make it available publicly on the next version release.

Cheers!