google/uis-rnn

[Invalid][Cloud] Speaker tag is not accurate

balavenkatesh3322 opened this issue · 1 comments

Describe the bug

I have tested with my audio file for speaker Diarization which is not accurate. i have attached audio file(speaker_tag issue.wav) and my python code.
Is there any problem with my python code or audio file?

To Reproduce

This is my python code for speaker diarization.

from google.cloud import speech_v1p1beta1 as speech
from google.oauth2 import service_account
import os
client = speech.SpeechClient(credentials=service_account.Credentials.from_service_account_file(os.getenv("GOOGLE_APPLICATION_CREDENTIALS")))


#audio = speech.types.RecognitionAudio(content=content)

audio = speech.types.RecognitionAudio(uri = 'STORAGE_AUDIO_URL')

config = speech.types.RecognitionConfig(
    encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=48000,
    language_code='en-US',
    enable_speaker_diarization=True,
    diarization_speaker_count=2)


operation = client.long_running_recognize(config, audio)

response = operation.result(timeout=1000)

result = response.results[-1]

words_info = result.alternatives[0].words

# Printing out the output:
for word_info in words_info:
    print("word: '{}', speaker_tag: {}".format(word_info.word,
                                               word_info.speaker_tag))

Data samples

Audio file google drive link here

Above audio file Output:-
word: 'he', speaker_tag: 2
word: 'sighed', speaker_tag: 2
word: 'what', speaker_tag: 2
word: 'brings', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'in', speaker_tag: 2
word: 'today', speaker_tag: 2
word: 'I', speaker_tag: 2
word: 'have', speaker_tag: 2
word: 'a', speaker_tag: 2
word: 'really', speaker_tag: 2
word: 'severe', speaker_tag: 2
word: 'cough', speaker_tag: 2
word: 'really', speaker_tag: 2
word: 'severe', speaker_tag: 2
word: 'headache', speaker_tag: 2
word: 'and', speaker_tag: 2
word: 'my', speaker_tag: 1
word: 'throat', speaker_tag: 2
word: 'really', speaker_tag: 2
word: 'itchy', speaker_tag: 2
word: 'okay', speaker_tag: 2
word: 'let', speaker_tag: 2
word: 'me', speaker_tag: 2
word: 'check', speaker_tag: 2
word: 'seems', speaker_tag: 2
word: 'like', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'have', speaker_tag: 2
word: 'one', speaker_tag: 2
word: 'or', speaker_tag: 2
word: 'two', speaker_tag: 2
word: 'temperature', speaker_tag: 2
word: 'to', speaker_tag: 2
word: 'did', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'take', speaker_tag: 2
word: 'any', speaker_tag: 2
word: 'medication', speaker_tag: 2
word: 'what', speaker_tag: 2
word: 'dosage', speaker_tag: 2
word: 'will', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'take', speaker_tag: 2
word: 'in', speaker_tag: 2
word: 'animal', speaker_tag: 1
word: 'okay', speaker_tag: 1
word: 'let', speaker_tag: 2
word: 'me', speaker_tag: 2
word: 'take', speaker_tag: 2
word: 'a', speaker_tag: 2
word: 'look', speaker_tag: 2
word: 'at', speaker_tag: 2
word: 'it', speaker_tag: 2
word: 'it's', speaker_tag: 2
word: 'like', speaker_tag: 2
word: 'a', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'got', speaker_tag: 2
word: 'a', speaker_tag: 2
word: 'flu', speaker_tag: 2
word: 'did', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'take', speaker_tag: 2
word: 'your', speaker_tag: 2
word: 'flu', speaker_tag: 2
word: 'shot', speaker_tag: 2
word: 'so', speaker_tag: 2
word: 'the', speaker_tag: 2
word: 'intensity', speaker_tag: 2
word: 'might', speaker_tag: 2
word: 'be', speaker_tag: 2
word: 'low', speaker_tag: 2
word: 'why', speaker_tag: 2
word: 'don't', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'continue', speaker_tag: 2
word: 'taking', speaker_tag: 2
word: 'your', speaker_tag: 2
word: 'Tylenol', speaker_tag: 2
word: 'for', speaker_tag: 2
word: 'your', speaker_tag: 2
word: 'draw', speaker_tag: 2
word: 'temperature', speaker_tag: 2
word: 'in', speaker_tag: 2
word: 'your', speaker_tag: 2
word: 'headache', speaker_tag: 2
word: 'and', speaker_tag: 2
word: 'write', speaker_tag: 2
word: 'some', speaker_tag: 2
word: 'cough', speaker_tag: 2
word: 'syrup', speaker_tag: 2
word: 'so', speaker_tag: 2
word: 'if', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'can', speaker_tag: 2
word: 'get', speaker_tag: 2
word: 'it', speaker_tag: 2
word: 'you', speaker_tag: 2
word: 'can', speaker_tag: 2
word: 'get', speaker_tag: 2
word: 'it', speaker_tag: 2
word: 'in', speaker_tag: 2
word: 'the', speaker_tag: 2
word: 'pharmacy', speaker_tag: 2
word: 'thank', speaker_tag: 2
word: 'you', speaker_tag: 2

The above ouput is not accurate with audio file. All words are showing speaker tag as 2. Please check audio file with output.

Versions

google-cloud-speech==0.36.0

This question is about Google Cloud diarization API, which is completely unrelated to UIS-RNN.

Please contact the customer service.