gpt4v
There are 37 repositories under gpt4v topic.
mnotgod96/AppAgent
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
X-PLUG/MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
reworkd/tarsier
Vision utilities for web interaction agents 👀
AmberSahdev/Open-Interface
Control Any Computer Using LLMs
bdekraker/WebcamGPT-Vision
Lightweight GPT-4 Vision processing over the Webcam
langgptai/Awesome-Multimodal-Prompts
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
pAIrprogio/vscode-ui-sketcher
Draw your projects to life
ShareGPT4Omni/ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
soulteary/amazing-openai-api
Convert different model APIs into the OpenAI API format out of the box.
zzxslp/MM-Navigator
GPT-4V in Wonderland: LMMs as Smartphone Agents
kyegomez/MambaByte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
cameronking4/sketch2app
The ultimate sketch to code app made using GPT4o serving 25k+ users. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a simple hand drawn sketch on paper captured from webcam
admineral/GPT4-Vision-React-Starter
Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description
BUAADreamer/Chinese-LLaVA-Med
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
icebergov/gpt4v-video-voiceover
Video Voiceover with gpt-4o-mini
roboflow/gpt-checkup
Monitor the performance of OpenAI's GPT-4V model over time.
Azure-Samples/rag-as-a-service-with-vision
This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion and enrichment flows, a RAG with Vision pipeline, and evaluation tools.
neka-nat/mylangrobot
Language instructions to mycobot using GPT-4V
reidbarber/webmarker
Mark web pages for use with vision-language models
kyegomez/HRTX
Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2
logicalroot/gpt-4v-demos
🤖 GPT-4V Demos • Test the model's vision capabilities in your browser using Streamlit • Easy setup
Charmve/gpt-eyes
I GAVE GPT-4 EYES!
GraphPKU/CoI
Chain of Images for Intuitively Reasoning
easonlai/webcam_chat_with_aoai_gpt4o
Discover the GPT-4o multimodal model at Microsoft Build 2024, now with text and image capabilities. My prototype enhances chats with real-time camera snapshots, powered by Flask, OpenCV, and Azure’s OpenAI Services. It’s interactive, visual, and simple to use. Give it a try!
elizabethsiegle/stephensmithify-openaivision-sendgrid
Analyze a Video and generate commentary about it with OpenAI's GPT-4V, Text-to-speech, LangChain, Streamlit, Replit, Twilio SendGrid, and OpenCV!
danomation/Discord-Vision-Bot
poc gpt-4 vision bot
dceluis/vacocam_render
Vision-Assisted Camera Orientation
Envedity/DAIA
Digital Artificial Intelligence Agent
yunwoong7/GPT-4V-Examples
Explore the power of GPT-4V with our curated examples and tutorials. This repository offers code snippets, step-by-step guides, and use case demonstrations for integrating GPT-4V into various applications. Perfect for both AI novices and experts!
gpt4api9/gpt4api9
麻雀GPTs-API市场
ethan-yz-hao/equation-ocr-app
OCR application for converting handwritten equations into LaTeX code using OpenAI's GPT-4V API, with LaTeX renderer for editing and checking (Next.js, Typescript, OpenAI GPT-4V, KaTex, Vercel)
jamesponddotco/allalt
[READ-ONLY] Describe images and generate alt tags for visually impaired users.
Ravi-Teja-konda/TunedLlavaDelights
Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition
sagentic-ai/cupid
Valentine's Day Cupid Agent
yunwoong7/VisionQuery-GPT-4v
VisionQuery GPT-4v is a cutting-edge tool that combines screenshot-based queries with OpenAI's GPT-4. It enables users to capture screens, ask questions, and receive insightful answers from GPT-4v, revolutionizing digital interaction and understanding.
metatatt/iso_bot
ISO 13485 Sniffer Bot, GPT4V with LlamaIndex embeded in React Bot UI