This project demonstrates the integration of OpenAI's GPT-4 Vision API with a HoloLens application. Users can capture images using the HoloLens camera and receive descriptive responses from the GPT-4V model. Alternatively, users can import and specify image files (tested with .jpg) and receive GPT-4V responses.
This app uses Unity 2022.3.4f1, although newer versions should work fine (untested, though!).
- Newtonsoft.Json (used for parsing OpenAI's response object, so somewhat optional)
- Open the
GPT4VisionExample
-Scene - Specify your OpenAI key in the GameObject
GPT4Vision
>OpenAIWrapper
(or hardcode it into the OpenAIWrapper.cs class) - Specify your base prompt (which is concatenated to the image sent to OpenAI), e.g. describe this image:
- Specify max tokens, sampling temperature, and image detail for the OpenAI API call
When running the application within the editor, the GameObject ImageTest
will send an examplatory image stored in the Images
folder to GPT-4V and ask for an image description (which then is printed to the console after a couple seconds).
You can call the OpenAIWrapper
function AnalyzeImageWithPrompt
with any image as byte[]
and specify your own base prompt on call as well. Just link a ref to any of your own scripts and it should work.
If you want to capture and use the photos from a HoloLens directly, disable the ImageTest GameObject, and simply link the function call of GPT4Vision.cs
> CapturePhoto
(e.g., to a HoloLens button, finger gesture, ...). Please be aware that capturing images with HoloLens only works on a real device - simulator / HoloLens Remoting Tool is not supported.
For some reason, HttpClient
likes to crash on HoloLens builds with MSVC v141, v142; make sure to use MSVC v143 and it works.
This project is a barebones prototype for now and still WIP. Feel free to create a PR.