akashmjn/tinydiarize

What is the point here?!

Opened this issue · 1 comments

I watched the demo video and got confused!
Am I misunderstanding the point here of this modification: [SPEAKER_TURN]!!!
I mean couldn't anyone with one or two lines of code make the output transcription add '[SPEAKER_TURN]' phrase at the end of every transcribed segment of audio, or there is something that I didn't pay attention to while I was watching the demo video there!!??
Shouldn't that be something like; 'Speaker 1' , 'Speaker 2', 'Speaker 3', ....etc, then the algorithm is a bit intelligent to tag the transcribed line with 'Speaker 2' if the transcribed line was actually and really said by speaker 2 !?