Reacting to content with GPT-4V, OpenAI tts, Cloudflare Workers and Mac shortcuts

This repo shows you how to create an application on Cloudflare Workers that lets you have GPT-4V react to anything you are doing on your computer in real-time.

In order to run this application you need a:

Cloudflare Account: This app can be run using the Workers free tier. Sign up for free here.
OpenAI Account and API Key: You can Sign up for an account here and generate an API key here
Mac or iOS device

Set up (Mac Shortcuts)

This application works by running a Mac Shortcut that takes the following actions:

Takes a screenshot using the "Take screenshot" action
Resizes that screenshot to be smaller using the "Resize" action
Sends the resized screenshot to our Worker using the "Get contents of" action
Plays the sound returned by our Worker using the "Play sound" action

To do this you can create a new Shortcut called "GPTReact" and copy the Shortcut configuration in this screenshot:

Set up (Cloudflare Workers)

Clone this repo and then run:

npm install

After installing your dependencies you'll need to add your OpenAI API key as an environmental variable so we can use it to make our requests to OpenAI:

npx wrangler secret put OPENAI_API_KEY

With your secret set, you can run this application locally to try it out:

npm run dev

Copy and paste your local URL into the "Get contents of" action in your Shorcut and then run the shortcut. It will take about 10-15 seconds for GPT-V4 to generate a response to your image and then to use OpenAI's tts to create the audio to have that response spoken back to you.

Once you're done developing you can deploy your application with this command:

npm run deploy

After your application is deployed, update the URL in the "Get contents of" action in your shortcut.

rickyrobinett/gpt4vcapture

Reacting to content with GPT-4V, OpenAI tts, Cloudflare Workers and Mac shortcuts

Set up (Mac Shortcuts)

Set up (Cloudflare Workers)