Whisperer Voice SDK Demo

Whisperer is a Unity VR game experience using the Voice SDK for interactions. This repository includes the complete buildable Unity project.

Requirements

Unity 2021.3.9f1
Windows or Mac
Meta Quest 2 (standalone) or Rift (PCVR)

Getting Started

Ensure you have Git LFS installed:

git lfs install

Then, clone this repo using the "Code" button above, or with:

git clone git@github.com:wit-ai/voicesdk_samples_whisperer.git

All of the project files can be found in Assets/Whisperer. This folder includes all scripts and assets to run the experience, excluding those that are part of the Interaction SDK. The project includes v50 of the Voice SDK.

To run Whisperer in-editor, after configuring Wit.ai (see below), open the project in Unity 2021.3.9f1. Then open the Assets/Scenes/Loader scene and press play.

Configuring Wit.ai

Using Whisperer reqiures a Wit.ai account.

Once logged in, on wit.ai/apps, click New App and import the zipped app backup included in this repo.
Then find the Server Access and Client Acess Tokens your app setup under Managment > Settings. Enter these values in the appropriate fields on the Wit.ai App Config asset in the unity project.

For more information on setting up an App, check out the Wit.ai Quickstart.

Note: Wit.ai will need to train its model before it's ready to use. On Wit.ai, the current status of the training is indicated by the dot next to the app name.

How To Play

Whisperer's introduction will help guide you, through narrative instruction and visual prompts, how to interact with objects using your hands and voice.

When you raise your hands in front of you, as if to speak through them, the microphone will be automatically activated and the voice SDK will listen to you. You can then speak to various objects, telling them to move, open, turn on, etc.
The inset menu button on the left controller (☰) will open the in-game panel displaying an instruction card and demonstration video, as well as buttons to restart the current level or return the starting scene.

Unity Editor Play Mode Navigation

To move the camera around, use your mouse.
To select an object, look towards it and hold down space key.
To jump from one level to another, use 1, 2, 3, and 4 keys on the keyboard.

Project Structure

The Loader scene contains two game objects that persist throughout the entire experience: Player Rig and Management.

The Player Rig is the XR Origin, and contains the necessary components for Unity's XR Interaction Toolkit, as well as the SpeakGestureWatcher.cs component and any UI canvases.

Attached to the Management game object are AppVoiceExperience.cs and LevelLoader.cs. LevelLoader additively loads the necessary Unity scenes for each level, unloading them when a level is completed.

Level Loader

Each level consists of two scenes additively loaded by the levelLoader -- a base scene containing all static geometry and non-interactable objects, and a level scene containing all scene logic, animated objects, and listenable objects for that particular level.

Every level contains a Level Manager prefab and a Listenables prefab. The Level Manager is responsible for that scene's logic and instantiates a VoiceUI prefab for any objects derived from Listenable.cs at Start().

App Voice Experience

AppVoiceExperience is the core component of the Voice SDK. It holds the reference to the Wit.ai App Config asset, sends data to Wit.ai for processing, and responds with the appropriate Unity Events. When an object derived from Listenable.cs is selected and deselected by the player, it subscribes and unsubscribes to the events on AppVoiceExperience.

Voice SDK

Whisperer utilizes several different methods of handling responses from Wit.ai. Depending on the type of interaction (action) we're trying to resolve, we use either intents, entities, or manual parsing of the text transcription.

To determine when to activate and deactivate Wit.ai, the SpeakGestureWatcher.cs component checks the position of the tracked hand controllers and raycasts for objects that contain the Listenable.cs class. If the player's hands are in position and an object is found, AppVoiceExperience.Activate() is called. If at any time the player breaks the pose, Wit.ai is deactivated.

The AppVoiceExperience class itself is initated in the LevelManager.cs parent class which all subsequent levels inherit from.

If an object derived from Listenable.cs is selected and the player says something, the Whisperer will wait for a response from Wit.ai, then read the WitResponseNode to determine the action to be taken.

Example: If a [ForceMovable.cs](Assets/Whisperer/Scripts/Voice/Listenable Objects/ForceMovable.cs) is selected and the utterance "Move right a lot" is detected by Wit.ai, we read the intent and entities from the WitResponseNode to determine the direction and strength of move force applied. Whisperer reads the returned intent (move), direction entity (right) and strength entity (strong) and performs an appropriate action.

Conduit Implementation

This code base uses Conduit framework from Voice SDK.

To use Conduit, simply annotate the callback method with the MatchIntent attribute and annotate the assembly containing the callback method with the ConduitAssembly attribute. When changes are made to the callback method, such as adding, removing, or changing it, Unity generates a new manifest file. Please note that Use Conduit should be checked in your wit config asset file as documented here.

For example, in HeroPlant.cs the ForceMove method takes two parameters: a ForceDirection enum and a WitResponseNode.

In this code ForceMove is decorated with the MatchIntent attribute, with possible intent values: move, jump, pull, and push. By using the Conduit framework, these callback methods can be automatically registered without the need for manual registration.

[MatchIntent("move")]
[MatchIntent("jump")]
[MatchIntent("pull")]
[MatchIntent("push")]
public override void ForceMove(ForceDirection direction, WitResponseNode node)
{
    // method implementation
}

Intents and Entities

These intents are used to move objects in the scene. The move, pull, push, and jump intents can be used with strength and direction entities. For example, "Push away from me a little bit."

move
pull
push
jump
levitate
drop

The entity direction determines the direction an object is moved when accompanied with the move, pull, push, or jump intents.

right
left
up
toward (moves object toward the user)
away (moves object away from the user)
wall (moves object away from center of the room)
across (moves object toward the center of the room)

The entity move_strength determies the strength of the force applied to an object when it's moved.

weak
normal
strong

Generic intents for interacting with objects such as the radio, drawers, water hose or treasure chest:

open
close
turn_off
turn_on

Intents for interacting with specific objects, include:

turn_off_radio
turn_on_radio
change_station
ask_bird_name
bird_song

The entity color is used to identify when the user has communicated a color selection to the bird in level 3.

yellow
blue
red

Licenses

The Oculus License applies to the SDK and supporting material. The MIT licence applies to the files and assets in the Assets/Whisperer folder. Otherwise, if an individual file does not indicate which license it is subject to, then the Oculus License applies.

XR-Access-Initiative/voicesdk-samples-whisperer-captions