[Invalid][Cloud] Speaker tag is not accurate
balavenkatesh3322 opened this issue · 1 comments
Describe the bug
I have tested with my audio file for speaker Diarization which is not accurate. i have attached audio file(speaker_tag issue.wav) and my python code.
Is there any problem with my python code or audio file?
To Reproduce
This is my python code for speaker diarization.
from google.cloud import speech_v1p1beta1 as speech
from google.oauth2 import service_account
import os
client = speech.SpeechClient(credentials=service_account.Credentials.from_service_account_file(os.getenv("GOOGLE_APPLICATION_CREDENTIALS")))
#audio = speech.types.RecognitionAudio(content=content)
audio = speech.types.RecognitionAudio(uri = 'STORAGE_AUDIO_URL')
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=48000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
operation = client.long_running_recognize(config, audio)
response = operation.result(timeout=1000)
result = response.results[-1]
words_info = result.alternatives[0].words
# Printing out the output:
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word,
word_info.speaker_tag))
Data samples
Audio file google drive link here
Above audio file Output:-
word: 'he', speaker_tag: 2
word: 'sighed', speaker_tag: 2
word: 'what', speaker_tag: 2
word: 'brings', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'in', speaker_tag: 2
word: 'today', speaker_tag: 2
word: 'I', speaker_tag: 2
word: 'have', speaker_tag: 2
word: 'a', speaker_tag: 2
word: 'really', speaker_tag: 2
word: 'severe', speaker_tag: 2
word: 'cough', speaker_tag: 2
word: 'really', speaker_tag: 2
word: 'severe', speaker_tag: 2
word: 'headache', speaker_tag: 2
word: 'and', speaker_tag: 2
word: 'my', speaker_tag: 1
word: 'throat', speaker_tag: 2
word: 'really', speaker_tag: 2
word: 'itchy', speaker_tag: 2
word: 'okay', speaker_tag: 2
word: 'let', speaker_tag: 2
word: 'me', speaker_tag: 2
word: 'check', speaker_tag: 2
word: 'seems', speaker_tag: 2
word: 'like', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'have', speaker_tag: 2
word: 'one', speaker_tag: 2
word: 'or', speaker_tag: 2
word: 'two', speaker_tag: 2
word: 'temperature', speaker_tag: 2
word: 'to', speaker_tag: 2
word: 'did', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'take', speaker_tag: 2
word: 'any', speaker_tag: 2
word: 'medication', speaker_tag: 2
word: 'what', speaker_tag: 2
word: 'dosage', speaker_tag: 2
word: 'will', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'take', speaker_tag: 2
word: 'in', speaker_tag: 2
word: 'animal', speaker_tag: 1
word: 'okay', speaker_tag: 1
word: 'let', speaker_tag: 2
word: 'me', speaker_tag: 2
word: 'take', speaker_tag: 2
word: 'a', speaker_tag: 2
word: 'look', speaker_tag: 2
word: 'at', speaker_tag: 2
word: 'it', speaker_tag: 2
word: 'it's', speaker_tag: 2
word: 'like', speaker_tag: 2
word: 'a', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'got', speaker_tag: 2
word: 'a', speaker_tag: 2
word: 'flu', speaker_tag: 2
word: 'did', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'take', speaker_tag: 2
word: 'your', speaker_tag: 2
word: 'flu', speaker_tag: 2
word: 'shot', speaker_tag: 2
word: 'so', speaker_tag: 2
word: 'the', speaker_tag: 2
word: 'intensity', speaker_tag: 2
word: 'might', speaker_tag: 2
word: 'be', speaker_tag: 2
word: 'low', speaker_tag: 2
word: 'why', speaker_tag: 2
word: 'don't', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'continue', speaker_tag: 2
word: 'taking', speaker_tag: 2
word: 'your', speaker_tag: 2
word: 'Tylenol', speaker_tag: 2
word: 'for', speaker_tag: 2
word: 'your', speaker_tag: 2
word: 'draw', speaker_tag: 2
word: 'temperature', speaker_tag: 2
word: 'in', speaker_tag: 2
word: 'your', speaker_tag: 2
word: 'headache', speaker_tag: 2
word: 'and', speaker_tag: 2
word: 'write', speaker_tag: 2
word: 'some', speaker_tag: 2
word: 'cough', speaker_tag: 2
word: 'syrup', speaker_tag: 2
word: 'so', speaker_tag: 2
word: 'if', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'can', speaker_tag: 2
word: 'get', speaker_tag: 2
word: 'it', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'can', speaker_tag: 2
word: 'get', speaker_tag: 2
word: 'it', speaker_tag: 2
word: 'in', speaker_tag: 2
word: 'the', speaker_tag: 2
word: 'pharmacy', speaker_tag: 2
word: 'thank', speaker_tag: 2
word: 'you', speaker_tag: 2
The above ouput is not accurate with audio file. All words are showing speaker tag as 2. Please check audio file with output.
Versions
google-cloud-speech==0.36.0
This question is about Google Cloud diarization API, which is completely unrelated to UIS-RNN.
Please contact the customer service.