This is a Python app written in Shiny, for easily interacting with GPT-4o via short webcam recordings. It was created for a livestream with @TinaHuang1 and Posit.
At the time of this writing (late May 2024), GPT-4o is available via OpenAI's chat completion API, but this only takes text and images as input and returns text as output. This app uses speech-to-text and text-to-speech to bridge the gap, allowing you to speak your prompt and provide a webcam feed, and hear the response.
Live demo: https://jcheng.shinyapps.io/multimodal/
You will need the ffmpeg
utility installed. Either use the official installers, or brew install ffmpeg
(for macOS brew users) or choco install ffmpeg
(for Windows chocolatey users).
Create a file called .env
in the root of the project and add the following line:
OPENAI_API_KEY=<your-api-key>
If you have an OpenAI account, you can generate an API key from this page.
pip install -r requirements.txt
shiny run app.py --port 0 --launch-browser
This will launch a browser window with a video preview. Press Record, speak your prompt, and press Stop. The video will be processed and the response will be read aloud.