The configuration for the Google DFE speech module is in the res_speech_gdfe.conf
file in the Asterisk configuration directory. It is a standard format Asterisk configuration file.
service_key
- (required) the path to a JSON-format Google service key or the actual key itself.endpoint
- (optional) the URL for the DialogFlow API endpoint. Leave blank to use the defaultdialogflow.googleapis.com
.vad_voice_threshold
- (optional) the average absolute amplitude of a packet to consider that packet to be 'voice'. The default is 512. Valid range 0-32767.vad_voice_minimum_duration
- (optional, milliseconds) the cumulative duration of consecutive 'voice' packets to consider the caller to be speaking. The default is 40 (milliseconds). Valid range 0-2147483647.vad_silence_minimum_duration
- (optional, milliseconds, not implemented) the cumulative duration of consecutive non-'voice' packets to consider the caller to be not speaking. The default is 500 (milliseconds). Valid range 0-2147483647. This setting currently has no effect as the end of speech is determined by DialogFlow.
To access the DialogFlow endpoint via a proxy you must set the environment variable http_proxy
to the URL of your proxy. This must be done for the Asterisk process as a whole.
Speech module behavior may be modified by using the SPEECH_ENGINE
dialplan function. Available settings are:
session_id
- set a session identifier to use when making DialogFlow API calls. This will be reflected in the history of the agent on the DialogFlow console. A default random value will be used if not provided.project_id
- set the project identifier to use when making DialogFlow API calls. This setting is required in order to determine which agent to use.language
- set the language for the recognition engine for when doing intent detection and prompt generation. The default isen
. The engine has no visibility into the channel language -- if it has changed it is still necessary to set the engine language.voice_threshold
- set the average absolute amplitude of a packet to consider that packet to be 'voice' (seevad_voice_threshold
, above).voice_duration
- set the cumulative duration of consecutive 'voice' packets to consider the caller to be speaking (seevad_voice_minimum_duration
, above).silence_duration
- the cumulative duration of consecutive non-'voice' packets to consider the caller to be not speaking (seevad_silence_minimum_duration
, above).
Before detecting intent for your calls, you must first:
- create the speech resource (only do this once),
- set the
project_id
, and
same => n,SpeechCreate()
same => n,Set(SPEECH_ENGINE(project_id)=my-project-12345)
To detect intent from voice, you should call SpeechBackground
to send audio to the module. You may specify a prompt to play while performing the detection.
same => n,SpeechBackground(hello-world)
To detect intent by event, you should activate a grammar with the name event:{your event name}
prior to calling SpeechBackground
. The SpeechBackground
application will return immediately -- you should not include a prompt.
same => n,SpeechActivateGrammar(event:welcome)
same => n,SpeechBackground(hello-world)
The DialogFlow module returns the following results (when available):
response_id
- the unique identifier for this responsequery_text
- the text of the speech recognized by DialogFlowlanguage_code
- the detected language of the recognized speechaction
- the action for the detected intentfulfillment_text
- the text of the next prompt for the callerintent_name
- the API name of the intent detectedintent_display_name
- the displayed name of the intent detectedraw_score
- the raw recognition scorefulfillment_message_N_text_M
- the fulfillment text messages from the responsefulfillment_message_N_simple_response_M
- the simple response fulfillment messages from the responsefulfillment_message_N_telephony_play_audio
- a URI from the telephony audio response fulfillment messagefulfillment_message_N_telephony_synthesize_speech
- text or SSML from the telephony synthesized speech fulfillment messagefulfillment_message_N_telephony_transfer_call
- a phone number for transferring from the telephony transfer call fulfillment messagefulfillment_message_N_telephony_terminate_call
- a flag indicating that the fulfillment message requested call terminationfulfillment_audio
- a path to audio corresponding to the fulfillment text
(those with N or M in the name may occur multiple times with different indexes in those positions)