Please find the updated version of this repository here, featuring more modern SDKs and a modular system design.
This repository contains a template scene for interfacing with an AI-based NPC using OpenAI API, Azure Voice API, Google Cloud Speech to Text, and Oculus Lip Sync. The project uses Unity 2021.3.x.
The framework allows for easy integration with YOLO-NAS, enabling the NPC to stream virtual camera frames and receive responses from a YOLO-NAS server instance, whether local or remote. The response includes all identified objects along with their confidence scores. Additionally, it leverages the Ready Player Me avatar, providing a reference for mapping Oculus lip sync to avatar models.
Once you've set up the configuration files, you can run the main scenes. After speaking to the NPC, it will respond within seconds. In the NPCVoiceSimple-MetaAssets
scene, the example models provided by Meta for Lip Sync will respond to user inquiries. In the NPCVoiceVisionSimple
scene, the NPC can be configured to see its environment using YOLO-NAS.
The ChatbotController
game object contains the main script for interfacing with the OpenAI API. To adjust the input parameters for each OpenAI API request, you can modify this component.
- Selecting the Default Model: You can choose the default model by selecting an option from the model dropdown in the
ChatbotController
inspector. - Assigning a Personality Profile: To assign a personality profile, specify a name and personality description in the
ChatbotController
component. To select a personality from the available options, write the name of the personality in the "Set Personality Profile Name" inspector value.
The Camera Streamer
component allows you to specify the endpoint of the YOLO-NAS server. This component receives a JSON object from the server, listing all identified objects. To process this data, you can explore the ReceiveData
thread.
Make sure to configure the necessary settings and explore the various components to customize and enhance the NPC's behavior and interactions.
For the YOLO-NAS server instance, please check out this repository.
Check out the NPCGroup
scene to setup multiple NPCs to chat with one another. One NPC is design to invoke the conversation about collision with another NPC (or player). USing the NPCGroupCommunicator
component, simply assign the NPC group you'd like and they'll use event systems to communicate with one another once they finish speaking.
- Integrate RageAgainstThePixel's Open AI library (https://github.com/RageAgainstThePixel/com.openai.unity).
- Add support for Eleven Labs' text-to-voice library (https://github.com/RageAgainstThePixel/com.rest.elevenlabs).
- Make use of YOLO-NAS results in OpenAI conversation requests.
- Add example scene with Avaturn Avatar Models (https://avaturn.me/).
- Further test group NPC conversations.
- Implement a dynamic way of invoking pre-downloaded animations.
- Update Meta Movement SDK to support URP and resolve pink assets
These additions to the project will enhance its capabilities and expand the available libraries and features over time.
Feel free to contribute to the project!
Before running the scene, you'll need to set up the following services and create a configuration file for the application to read at runtime:
Review the setup instructions for the following repositories that are used in this project:
In the StreamingAssets
folder, create a services_config.json
file with the following template, and replace the placeholder values with your own API keys and region information:
{
"OpenAI_APIKey": "your_openai_api_key",
"AzureVoiceSubscriptionKey": "your_azure_voice_subscription_key",
"AzureVoiceRegion": "your_azure_voice_region"
}
Create a gcp_credentials.json
file for Google Cloud runtime to read configuration properties from using the following template:
{
"type": "service_account",
"project_id": "YOUR PROJECT ID",
"private_key_id": "YOUR PRIVATE KEY ID",
"private_key": "YOUR PRIVATE KEY",
"client_email": "YOUR CLIENT EMAIL",
"client_id": "YOUR CLIENT ID",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "YOUR CLIENT CERT URL"
}
Google Cloud's speech to text has a time limit of 5 minutes for stream requests. The developer must add the ability to start/stop/restart the stream.
Sometimes on the first time opening the project, you might get the following error message: "Multiple precompiled assemblies with the same name Newtonsoft.Json.dll included on the current platform." This error occurs because Unity enforces a Newtonsoft import due to its services.core dependency. To resolve this error:
- Close the project
- Open the file explorer and navigate to
\Library\PackageCache\com.oshoham.unity-google-cloud-streaming-speech-to-text@0.1.8\Plugins
- Delete the
Newtonsoft.dll
file - Reopen the project and hit "Ignore"
- Delete
Newtonsoft.dll
again from the same location - The import should now complete.
Models Used: This project currently uses the TextDavinciV3 and the ChatGpt3_5Turbo. It has support for GPT4 (in the code), though the demo scenes do not use.
In addition to the APIs and packages mentioned above, this project also uses the Meta Movement SDK. More information on this SDK can be found in its GitHub repository at https://github.com/oculus-samples/Unity-Movement.