Multimodal GPT-4V Not Hotdog

OpenAI had a ton of awesome announcements at their most recent Developer Day, including multimodal support.

I immediately seized the opportunity to use billions of dollars of hard work and AI research to create the ultimate product: a Twilio powered app that can detect whether an image contains a hotdog or not!

It leverages GPT-4V, LangChain.js, and Twilio. The live deployment runs on a Cloudflare worker.


First, you'll need a phone number with MMS capability from Twilio.

You'll need to set your OpenAI key in .dev.vars and in your Cloudflare console:


Run npm install. You can run the src/index.ts file locally with npx wrangler dev, but it's easiest to test with a live Twilio number.

When you're ready, deploy with npx wrangler deploy. Take note of the URL and put it as the recipient webhook in Twilio.

And that's it! Text your phone number a picture, and it will respond with whether it's a hotdog or not.

Thank you!

I hope this helps you eat healthier!

Thank you to Craig and Lizzie for their help with this!

For more, follow me on X (formerly Twitter) @Hacubu and LangChain @LangChainAI.