This repo shows you how to create an application on Cloudflare Workers that lets you have GPT-4V react to anything you are doing on your computer in real-time.
In order to run this application you need a:
- Cloudflare Account: This app can be run using the Workers free tier. Sign up for free here.
- OpenAI Account and API Key: You can Sign up for an account here and generate an API key here
- Mac or iOS device
This application works by running a Mac Shortcut that takes the following actions:
- Takes a screenshot using the "Take screenshot" action
- Resizes that screenshot to be smaller using the "Resize" action
- Sends the resized screenshot to our Worker using the "Get contents of" action
- Plays the sound returned by our Worker using the "Play sound" action
To do this you can create a new Shortcut called "GPTReact" and copy the Shortcut configuration in this screenshot:
Clone this repo and then run:
npm install
After installing your dependencies you'll need to add your OpenAI API key as an environmental variable so we can use it to make our requests to OpenAI:
npx wrangler secret put OPENAI_API_KEY
With your secret set, you can run this application locally to try it out:
npm run dev
Copy and paste your local URL into the "Get contents of" action in your Shorcut and then run the shortcut. It will take about 10-15 seconds for GPT-V4 to generate a response to your image and then to use OpenAI's tts to create the audio to have that response spoken back to you.
Once you're done developing you can deploy your application with this command:
npm run deploy
After your application is deployed, update the URL in the "Get contents of" action in your shortcut.