Copyright 2022 Bart Trzynadlowski
ChatARKit is an experiment to see whether ChatGPT can be harnessed to write code using custom user-defined APIs. You can speak a prompt asking ChatARKit to place objects of a certain type on nearby planes and perform some basic manipulations of their position, scale, and orientation. More interactions could readily be added. Here are some sample prompts to try:
- "Place a cube on the nearest plane."
- "Place a spinning cube on the floor."
- "Place a sports car on the table and rotate it 90 degrees."
- "Place a school bus on the nearest plane and make it drive back and forth along the surface."
Full disclosure: the demo video represents some of the best results I got. Performance is generally worse. ChatGPT produces highly variable results for identical prompts. It frequently injects JavaScript functions that are not present in my JavaScriptCore execution context and will misinterpret the relationship between the user, cameraPosition
, and planes. Sometimes it converts object descriptions into code-like identifiers (e.g., "school bus" becomes "schoolBus"), which breaks the Sketchfab querying logic. Much of this is fixable and it's a good idea to examine the generated code, which is printed to the console by both the iOS app and the ChatGPT server. If OpenAI changes the model or takes further action to block bots, this demo could cease to function altogether.
I encourage interested developers to contact me and discuss ways this could be improved and turned into a more robust demo. And if you have any other fun ideas for AI projects, I'm always up to chat :)
ChatARKit consists of an iOS app and a Python server for interacting with ChatGPT. It makes use of the following projects:
- whisper.cpp by Georgi Gerganov. Fast C++ implementation of OpenAI's Whisper speech-to-text model.
- GLTFSceneKit by magicien. gLTF loader for SceneKit, used to import 3D assets from Sketchfab.
- Zip by Roy Marmelstein. Swift framework for unzipping files, used to unzip files downloaded from Sketchfab.
- pyChatGPT by terry3041. A Python wrapper for accessing ChatGPT. My Python ChatGPT server app depends on this.
Sketchfab is used to fetch 3D assets (except "cube" which is handled natively). Sign up for a free user account and then under your profile settings, find your API token. Open up iOS/ChatARKit/ChatARKit/Engine/SketchfabEntity.swift
and overwrite the dummy token in _jsonPayloadHeader
near the top of the SketchfabEntity
class.
The profile settings can be found by clicking on your user icon in the upper right. Then, click Edit Profile.
The API token is under Passwords & API.
Log into ChatGPT and find your session token. In Chrome, open up Developer Tools and look for __Secure-next-auth.session-token
under the Application tab and Cookies, which appears under Storage.
pyChatGPT must be installed. I recommend using conda to manage your environment. In the ChatARKit repository directory, create a conda environment, clone pyChatGPT, and install it like so:
conda create -n chatarkit python=3.10
conda activate chatarkit
git clone https://github.com/terry3041/pyChatGPT.git
cd pyChatGPT
pip install -U pyChatGPT
If using conda, make sure your conda environment is activated, e.g.:
conda activate chatarkit
Then simply run chatgpt.py
:
python -m chatgpt
By default, it will listen on port 6502. It will ask you for the session token obtained in step 2. Paste it and press enter. A Chromium window should appear and automatically log into ChatGPT.
Before opening the iOS project, make sure to download the required Whisper model. From the root of the repository directory on MacOS, type:
curl -L --output iOS/ChatARKit/ChatARKit/ggml-base.en.bin https://huggingface.co/datasets/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
Open iOS/ChatARKit/ChatARKit.xcodeproj
and deploy ChatARKit to your iPhone. Connect to the ChatGPT Python relay server by using your computer's local IP address. On Windows, this can be found using the ipconfig
command. On MacOS or Linux, use ifconfig
.
Look around to ensure some planes are detected and then press the Record button and speak a prompt. For example: "Place a cube on the nearest plane." Press Stop when finished to parse the result. You should see ChatGPT producing a response in the Chromium window. This can take a very long time. Be patient. Once ChatGPT is finished, the code is beamed back to your iPhone and should hopefully execute successfully.
The ChatGPT relay server is pretty straightforward: it accepts connections, passes prompts received from them to ChatGPT, and then sends back the responses with prose and code neatly segregated.
The ChatARKit iOS app is more complex. A JavaScript environment is set up allowing scripts that create entities (objects with a visual representation that can be manipulated) to be run. The important source files to understand are:
ViewController.swift
: This manages the UI and the connection to the relay server. When connected, a Record button appears in the lower left. Pressing this button starts capturing audio from the microphone and pressing stop passes it to Whisper for transcription. Most of this logic is near the very top inonToggleRecordButtonPressed()
. The text is sent to ChatGPT and the response eventually arrives inonMessageReceived()
, where the JavaScript code contained inside is passed to the AR engine for execution.Engine.swift
: This is the AR experience engine. It is responsible for 1) running the ARKit session and 2) executing JavaScript code. JavaScriptCore is used to create a JavaScript execution context with a few custom functions and variables defined for scripts to use. The engine also constructs the final prompt sent to ChatGPT inaugmentPrompt()
. Here, the custom JavaScript environment is described in a lengthy preamble, the user prompt is spliced in, and some additional constraints are added for good measure.Entity.swift
:Engine.createEntity()
creates entities that are exposed to the JavaScript run-time. They contain only a few properties (position
,scale
, andeuler
, each a 3-element array of floats). I couldn't quite figure out how to expose these properties correctly. Using theJSExport
protocol produces fields that are read-only, so I employ a gross workaround.
The following methods and properties are exported to JavaScript:
print()
: Prints to the console.getPlanes()
: Returns an array of all planes that have been found by ARKit.getNearestPlane()
: Returns the nearest plane to the user.getGroundPlane()
: Returns the lowest plane.createEntity()
: Takes a string describing the entity to create and instantiates it. If the description is "cube", it will create a simple cube. Otherwise, the description is used to search Sketchfab for a model to import.distance()
: Computes the distance between two 3D vectors represented as 3-element arrays. ChatGPT sometimes insists on using functions it thinks exists anddistance()
is a common one. Instead of trying to fight it, I gave in and implemented it.cameraPosition
: The current AR camera position as a 3-element array.
It is interesting to note that ChatGPT is usually smart enough to implement the functionality of getNearestPlane()
and sometimes even getGroundPlane()
on its own. However, it occasionally gets spectacularly confused and will do something nonsensical. Luckily, it will usually end up using functions that are described to it in the prompt. ChatGPT also has a nasty habit of defining functions after it has used them, which produces code that fails to run.
Common sources of problems:
- ChatGPT generates bad code that fails to run. Examine the output carefully when this happens.
- No suitable models are found by Sketchfab, the models fail to download, or an import error is experienced.