A collection of awesome GPT4 vision use cases.
- Animal Classification
- Web design agent: writes code, looks at the resulting site, improves the code accordingly, repeat.
- Image to Replit website
- Picture to Lightroom settings
- Figma to HTML
- Image to JSON: turn an image of groceries into a JSON list of objects
- Mockup feedback
- Reference pop culture and movies
- Explain humor and memes
- Understand diagrams
- Understand complex powerpoint slides
- Name new architectural styles
- Screenshot of website to code
- Solve math and physics problems
- Solve chess puzzles
- Give interior design feedback
- Create recipe from image of food dish
- Find Waldo
- Whiteboard logic to code
- Decipher handwriting
- Translate between languages
- Understand parking signs
- Sketch to logo using DALL•E 3
- Photography feedback
- Stock price trajectory analysis
- Generate workout routines given image of workout equipment
- Analyze X-rays
- Estimate calories of food
- Classify physical locations and landmarks
- Create stories from movie stills
- Decide how to play video games
From the The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) paper:
- Analyze receipts
- Calculate order total price
- Celebrity Recognition
- landmark recognition
- General text prompts
- Smart prompting
- Extract driver's license data
- Object localization
- Count things
- Bounding boxes / object localization
- Jokes and memes
- Science and knowledge
- Text understanding
- Scene understanding
- Visual math understanding
- Flow chart understanding
- Chart understanding
- Table to code
- Table reasoning
- Document understanding: Floor plan, posters, diagrams
- Understand research papers and diagrams
- Multilingual culture descriptions
- Multilingual image descriptions
- Multilingual text
- Transcribe math to LaTeX
- Generate code to draw graphics
- Visual prompting + text
- Visual prompting
- Understand pointing inputs
- Generate pointing outputs
- Understanding abstract visual stimuli
- Discover and associate parts and objects
- Read emotions from facial expressions
- Emotional effects of images
- Emotional conditioning on images
- Few shot prompting
- Food recognition
- Spot the difference
- Defect detection
- Safety inspection
- Grocery checkout
- Medical image description
- Medical reports / diagnosis
- Insurance damage evaluation
- Insurance report generation
- Customized image captioning: use set of people
- Customized image captioning: use set of objects
- Image counterfactuals: disagree with the false prompt
- Logo recognition
- Image complex logo and brands
- Dense captioning
- Image generation: editing
- Image generation: evaluation
- Image generation: prompts
- Image sequences
- Agentic actions: Use coffee machines
- Navigate around a house as a robot
- Browse the web
- Shop on Amazon
- Use windows OS
- Watch TikToks
- Use plugins
- Multimodal chains
- Multimodal commonsense
- Raven's progressive matrices
- Self reflection for image generation
- Self reflection for coding
- Self consistency and voting
- Video: anticipate the next actions
- Video: localization reasoning
- Video: order the steps
- Video: visual prompting
- Wechsler Adult Intelligence Scale