Create a world with voice commands

My first OpenAI project. I can place objects in the world with my voice. I'm using "OpenAI Codex" to generate .json and to convert my voice to text I use "Microsoft Azure Speech-To-Text".

Note: To make the program run you need your own API keys. More info here

Youtube demo link

How it works

1. Create a world

To display the VR world I use the open-source VR game engine StereoKit. Its main programming language is C#. This is just a basic example of how to draw a cube.

using StereoKit;

class Program
{
    static void Main(string[] args)
    {
        SK.Initialize(new SKSettings{ appName = "Project" });
        SK.Run(() =>
        {
            Mesh.Cube.Draw(Material.Default, Matrix.S(0.1f));
        });
    }
}

2. Use the OpenAI API

Right prompts

The first thing I have to do is create a start prompt for the AI, which it is going to continue on. In the start prompt you set up the rules for the AI.

Create a json block from prompt.
Example:
text:Create a blue cube at position one one one
json:{"id": 0, "position": {"x": 0, "y": 0, "z": -1}, "scale": {"x": 1.0, "y": 1.0, "z": 1.0}, "shape": "cube", "color": {"r": 0.0, "g": 0.0, "b": 1.0}}
text:remove or delete the blue cube
json:{"id": 0, "remove": true}
Real start with id 0:
text:

Using OpenAI playground is a good place to test our prompts. I used the Codex Cushman model.

API

To use the OpenAI API you first need an API key which can be created under Personal -> View API keys

Create an api object and async GenerateAIResponce function which can be run on a different thread. It needs to be run on a different thread so the program doesn't freeze when waiting for a response.

var api = new OpenAI_API.OpenAIAPI(openAiKey);

static async Task<CompletionResult> GenerateAIResponce(OpenAI_API.OpenAIAPI anApi, string aPrompt)
{
    var request = new CompletionRequest(
            prompt: aPrompt,
            model: OpenAI_API.Models.Model.CushmanCode,
            temperature: 0.1,
            max_tokens: 256,
            top_p: 1.0,
            frequencyPenalty: 0.0,
            presencePenalty: 0.0,
            stopSequences: new string[] { "text:", "json:", "\n" }
            );
    var result = await anApi.Completions.CreateCompletionAsync(request);
    return result;
}

The response you get from OpenAI is a string that gets converted to a JSON object. I then read the JSON object values and update the objects.

public void UpdateFromJSON(JObject someData)
{
    someData.TryGetValue("position", out JToken JPos);
    someData.TryGetValue("scale", out JToken JScale);
    someData.TryGetValue("shape", out JToken JShape);
    someData.TryGetValue("color", out JToken JColor);

    //Position
    if (JPos != null)
    {
        myPose.position = JSONConverter.FromJSONVec3((JObject)JPos);
    }
    //Scale
    if (JScale != null)
    {
        myScale = JSONConverter.FromJSONVec3((JObject)JScale);
    }
    //Mesh
    if (JShape != null)
    {
        string str = JShape.ToString();
        myShape = str;

        if (str == "cube")
        {
            myModel = Model.FromMesh(Mesh.Cube, Material.UI);
        }
...

3. Convert speech to text

To use Microsoft's speech service we first need to create a "Speech Service resource" and get our key. We then create a speechRecognizer and add a lambda expression to the Recognizing event. This event is called every time the speech recognizer receives a result.

var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);
speechConfig.SpeechRecognitionLanguage = "en-US";

using var audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

speechRecognizer.Recognizing += (s, e) =>
{
    speechAIText = e.Result.Text;
};

Start the speech recognition by calling:

speechRecognizer.StartContinuousRecognitionAsync().Wait();

And to end the speech recognition call:

speechRecognizer.StopContinuousRecognitionAsync().Wait();

Store our keys securely with "Visual Studio user secrets"

Don't store your keys in the Git project and accidentally push them to Github! Visual Studio user secrets is a JSON file stored in a different folder than the project on your local PC. Go to Solution Explorer -> Manage User Secrets and add your keys to the JSON file.

{
  "OPENAI_API_KEY": "xxxxxxxxxxxxxxxxxx",
  "SPEECH_KEY": "xxxxxxxxxxxxxxxxx",
  "SPEECH_REGION": "xxxxxxxxxxx"
}

We can then load in our keys by calling:

var config = new ConfigurationBuilder().AddUserSecrets<Program>().Build();
string openAiKey = config.GetSection("OPENAI_API_KEY").Value;
string speechKey = config.GetSection("SPEECH_KEY").Value;
string speechRegion = config.GetSection("SPEECH_REGION").Value;

Packages that are used

StereoKit which is an open-source VR game engine
OpenAI API C#/.NET wrapper to make API calls to Open AI
Microsoft Azure Speech to text to make API calls to to convert speech to text
Newtonsoft which is a JSON framework

CanYouCatchMe01/CreateWorldWithOpenAI