googleapis/google-cloud-php

Speech API: Cannot use explicit_decoding_config with encoding = ENCODING_UNSPECIFIED

jfradj opened this issue · 1 comments

Hello,

I want to use the speech API to convert speech into text.


TL;DR

Using:

$explicitConfig = new Google\Cloud\Speech\V2\ExplicitDecodingConfig([
    'encoding' => Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding::ENCODING_UNSPECIFIED,
    'sample_rate_hertz' => 16000,
]); 

Throws that error:

Invalid audio channel count value: 0. Values must be non-negative.

While using:

$explicitConfig = new Google\Cloud\Speech\V2\ExplicitDecodingConfig([
    'encoding' => Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding::ENCODING_UNSPECIFIED,
    'sample_rate_hertz' => 16000,
    'audio_channel_count' => 2,
]); 

Throws that error:

The RecognitionConfig proto is invalid:
  * explicit_decoding_config.audio_channel_count: audio_channel_count isn't supported by the set encoding

Long and detailed version for the courageous ones :)

Environment details

  • OS: MacOS Sonoma 14.3 (23D56)
  • PHP version: PHP 8.2.17
  • Package name and version: google/cloud-speech 1.18.2

Steps to reproduce

I'm working on audio .aac files (generated by Instagram).
I tried the online GUI (https://console.cloud.google.com/speech/transcriptions) to try if the .acc file would be supported and it worked =>
Capture d’écran 2024-05-04 à 07 41 53

When using the GUI, after uploading the file I have a warning Unable to automatically detect audio information. Please review your audio file and enter the relevant fields manually.
So I fill fields manually:

  • Encoding = ENCODING_UNSPECIFIED
  • Sample rate = 16000
  • Channel count remains empty

This worked as shown on the screenshot above.

Then I wanted to do the same thing by code using the google/cloud-speech package.

I tried to use the auto_decoding_config option but got the following error:

Audio data does not appear to be in a supported encoding. If you believe this to be incorrect, try explicitly specifying the decoding parameters.

Which is the same behavior as the GUI.

So I tried to use the explicit_decoding_config parameter and it failed.
See code below.

Code example

$audioFile = 'https://lookaside.fbsbx.com/ig_messaging_cdn/?asset_id=374095301647771&signature=AbxHJBUywVeA26a-1lSTIeODgXgrAsmxD7pCjaxDo7nNowZZvgE_3fC5jMA3H-9UX7AtT7vdNe3N772RgQpNbgBsvmfp3eT439xW14QykJsqVfvg0aC_GVOJ6sBLBhqDyEzDv7Vt08pCStD0dHvG7PHcL7Gp4RvddKRT_TSYVBQP3PTFPiECX9PsMK528lRG4FaYYIAXN4sBcyeIZsRK6EiiWxo_6g';

$client = new Google\Cloud\Speech\V2\Client\SpeechClient();

$content = file_get_contents($audioFile);

$explicitConfig = new Google\Cloud\Speech\V2\ExplicitDecodingConfig([
    'encoding' => Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding::ENCODING_UNSPECIFIED,
    'sample_rate_hertz' => 16000,
]);

$config = new Google\Cloud\Speech\V2\RecognitionConfig([
    'explicit_decoding_config' => $explicitConfig,
    'language_codes' => ['en-EN'],
    'model' => 'latest_long',
]);

$request = new RecognizeRequest([
    'recognizer' => 'projects/{MY_PROJECT_ID}/locations/global/recognizers/_',
    'config' => $config,
    'content' => $content,
]);

$response = $client->recognize($request);
$results = $response->getResults();

foreach ($results as $result) {
    $alternatives = $result->getAlternatives();
    $mostLikely = $alternatives[0];
    $transcript = $mostLikely->getTranscript();
    $confidence = $mostLikely->getConfidence();
    printf('Transcript: %s' . PHP_EOL, $transcript);
    printf('Confidence: %s' . PHP_EOL, $confidence);
}

This code throw the following error:

Invalid audio channel count value: 0. Values must be non-negative.

And setting the audio channel like this:

$explicitConfig = new Google\Cloud\Speech\V2\ExplicitDecodingConfig([
    'encoding' => Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding::ENCODING_UNSPECIFIED,
    'sample_rate_hertz' => 16000,
    'audio_channel_count' => 2,
]); 

Throw that error:

The RecognitionConfig proto is invalid:
  * explicit_decoding_config.audio_channel_count: audio_channel_count isn't supported by the set encoding

Thanks for your help.

Regards,
Johann

@jfradj Hello!

Thanks for reaching out!

I debugged your example and I am also having the same behaviour. It seems that the UI might be have different behaviour. I recommend changing the encoding to another format. It seems that some users have been successful by changing it to wav:
https://stackoverflow.com/questions/51559252/google-cloud-speech-recognition-api-php-file-encoding-issue
https://stackoverflow.com/questions/47783730/google-cloud-speech-transcription-for-the-aac-encoding

And some resources on changing the encoding:
https://stackoverflow.com/questions/14994342/how-to-convert-audio-to-wav-format-on-upload

Hope this helps!

Closing the issue as is related to the API behavour more than the php library itself.

Feel free to comment if you have more questions or require more help!