/UE4GoogleSpeechKit-docs

Documentation page for GoogleSpeechKit unreal engine plugin

UE4 Google Speech Kit

This is UE4 wrapper for Google's Cloud Text-to-Speech and syncronous Cloud Speech-to-Text speech recognition.

Plugin was battle tested in several commercial simulator projects. It is small, lean and simple to use.

Table of contents

Engine preparation

To make microphone work, you need to add following lines to DefaultEngine.ini of the project.

[Voice]
bEnabled=true

To not loose pauses in between words, you probably want to check silence detection treshold voice.SilenceDetectionThreshold, value 0.01 is good. This also goes to DefaultEngine.ini.

[SystemSettings]
voice.SilenceDetectionThreshold=0.01

Starting from Engine version 4.25 also put

voice.MicNoiseGateThreshold=0.01

Another voice related variables worth playing with

voice.MicNoiseGateThreshold
voice.MicInputGain
voice.MicStereoBias
voice.MicNoiseAttackTime
voice.MicNoiseReleaseTime
voice.MicStereoBias
voice.SilenceDetectionAttackTime
voice.SilenceDetectionReleaseTime

To find available settings type voice. in editor console, and autocompletion widget will pop up.

Console variables can be modified in runtime like this

To debug your microphone input you can convert output sound buffer to unreal sound wave and play it.

Above values may differ depending on actual microphone characteristics.

Cloud preparation

  1. Go to google cloud and create payment account.
  2. Enable Cloud Speech-to-Text API and Cloud Text-to-Speech API.
  3. Create credentials to access your enabled APIs. See instructions here.

  1. There are two ways how you can use your credentials in project.

    • 4.1 By using environment variables. Create environment variable GOOGLE_API_KEY with created key as value.

    • 4.2 By assigning key directly in blueprints. This can be called anywhere.

    By default you need to set api key from nodes. To use environment variable, you need to set Use Env Variable to true.

ADVICE: Pay attention to security and encrypt your assets before packaging.

Speech synthesis

You need to supply text to async node, as well as voice variant, speech speed, pitch value and optionally audio effects. As output you will get sound wave object which can be played by engine.

Speech recognition

Consists of two parts. Voice capture, and sending request. There are two ways how you can capture your voice, depending on your needs.

Grant permissions

Windows

No actions needed

Mac

  1. In Xcode, select you project
  2. Go to Info tab
  3. Expand Custom macOS Application Target Properties section
  4. Hit +, and add Privacy - Microphone Usage Description string key, set any value you want, for example "GoogleSpeechKitMicAccess"

Android

Call this somewhere on begin play

  1. Give microphone access (android.permission.RECORD_AUDIO)
  2. Give disk read access (android.permission.READ_EXTERNAL_STORAGE)
  3. Give disk write access (android.permission.WRITE_EXTERNAL_STORAGE)

Voice capture and speech recognition

Windows only method (deprecated)

Use provided MicrophoneCapture actor component as shown below. Next, construct recognition parameters and pass them to Google STT async node.


Cross platform method (use this instead)

  1. Create SoundMix.

    1. Right click in content browser - Sounds > Mix > Sound Soundmix
    2. Open it, and set output value to -96.0
  2. Create sound class

    1. Right click in content browser - Sounds > Classes > Sound Class
    2. Open it, and set our submix that we created in previous step as sound class default submix
  3. Make sure Audio Capture plugin is enabled

  4. Go to your actor, and add AudioCapture component in components tab

  5. Disable "Auto Activate" option on AudioCapture

  6. Set our sound class to AudioCapture

  7. Now we can drop some nodes. In order to start and stop recording, we use Activate and Deactivate nodes with previously added AudioCapture component as a target. When audio capture is activated, we can start recording output to our submix

  8. When audio capture is deactivated, we finish recording output to Wav File! This is important! Give your wav file a name (e.g. "stt_sample"), Path can be absolute, or relative (to the /Saved/BouncedWavFiles folder)

  9. Then, after small delay, we can read saved file back as byte samples, ready to be fed to Google STT node. Delay is needed since "Finish Recording Output" node writes sound to disk, file write operation takes some time, if we will proceed immediately, ReadWaveFile node will fail

Here is the whole setup


There is another STT node - Google STT Variants node. Which, instead of returning result with highest confidence, returns an array of variants.

Utilities

Percentage based string comparison (Fuzzy matching)

Probably, you will need to process recognised voice in your app, to increase recognition chances use CompareStrings node. Below call will return 0.666 value, so we can treat those strings equal since they are simmilar on 66%. Utilizes Levenstein distance algorithm

Listing available capture devices

You can pass microphone name to microphone capture component. To get list of available microphones, use following setup

Supported platforms

Windows, Mac and Android.

Migration guide

Version 3.0

EGoogleTTSLanguage was removed. You need to pass voice name as string (Voice name column).

new_language_pin

WARNING: Since synthesys parameters has changed, TTS cache is no longer valid! Make sure you remove TTS cache if exists. Editor/Game can freeze if old cache wll be loaded. So make sure to remove PROJECT_ROOT/Saved/GoogleTTSCache folder. Or invoke WipeTTSCache node before GoogleTTS node is executed!

The reason for this is that the number of languages has exceeded 256, and we can't put this amount into 8 bit enums (This is Unreal's limitation)

Links