gpt4v

There are 37 repositories under gpt4v topic.

  • mnotgod96/AppAgent

    AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

    Language:Python5.3k6986575
  • X-PLUG/MobileAgent

    Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

    Language:Python3.2k5468302
  • reworkd/tarsier

    Vision utilities for web interaction agents 👀

    Language:Jupyter Notebook1.5k111992
  • AmberSahdev/Open-Interface

    Control Any Computer Using LLMs

    Language:Python861121869
  • bdekraker/WebcamGPT-Vision

    Lightweight GPT-4 Vision processing over the Webcam

    Language:JavaScript2743249
  • langgptai/Awesome-Multimodal-Prompts

    Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.

  • vscode-ui-sketcher

    pAIrprogio/vscode-ui-sketcher

    Draw your projects to life

    Language:TypeScript1952413
  • ShareGPT4Omni/ShareGPT4V

    [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

    Language:Python1783174
  • amazing-openai-api

    soulteary/amazing-openai-api

    Convert different model APIs into the OpenAI API format out of the box.

    Language:Go1464811
  • zzxslp/MM-Navigator

    GPT-4V in Wonderland: LMMs as Smartphone Agents

    Language:Python1291552
  • kyegomez/MambaByte

    Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

    Language:Python111426
  • sketch2app

    cameronking4/sketch2app

    The ultimate sketch to code app made using GPT4o serving 25k+ users. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a simple hand drawn sketch on paper captured from webcam

  • admineral/GPT4-Vision-React-Starter

    Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description

    Language:TypeScript752241
  • BUAADreamer/Chinese-LLaVA-Med

    中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

    Language:Python61174
  • icebergov/gpt4v-video-voiceover

    Video Voiceover with gpt-4o-mini

    Language:Jupyter Notebook33
  • roboflow/gpt-checkup

    Monitor the performance of OpenAI's GPT-4V model over time.

    Language:HTML31615
  • Azure-Samples/rag-as-a-service-with-vision

    This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion and enrichment flows, a RAG with Vision pipeline, and evaluation tools.

    Language:Python201335
  • neka-nat/mylangrobot

    Language instructions to mycobot using GPT-4V

    Language:Python18200
  • reidbarber/webmarker

    Mark web pages for use with vision-language models

    Language:TypeScript18152
  • kyegomez/HRTX

    Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2

    Language:Python17403
  • logicalroot/gpt-4v-demos

    🤖 GPT-4V Demos • Test the model's vision capabilities in your browser using Streamlit • Easy setup

    Language:Python17204
  • Charmve/gpt-eyes

    I GAVE GPT-4 EYES!

    Language:JavaScript14304
  • GraphPKU/CoI

    Chain of Images for Intuitively Reasoning

    Language:Python8411
  • easonlai/webcam_chat_with_aoai_gpt4o

    Discover the GPT-4o multimodal model at Microsoft Build 2024, now with text and image capabilities. My prototype enhances chats with real-time camera snapshots, powered by Flask, OpenCV, and Azure’s OpenAI Services. It’s interactive, visual, and simple to use. Give it a try!

    Language:HTML6202
  • elizabethsiegle/stephensmithify-openaivision-sendgrid

    Analyze a Video and generate commentary about it with OpenAI's GPT-4V, Text-to-speech, LangChain, Streamlit, Replit, Twilio SendGrid, and OpenCV!

    Language:Python5401
  • danomation/Discord-Vision-Bot

    poc gpt-4 vision bot

    Language:Python4100
  • dceluis/vacocam_render

    Vision-Assisted Camera Orientation

    Language:Jupyter Notebook4100
  • Envedity/DAIA

    Digital Artificial Intelligence Agent

    Language:Python3000
  • yunwoong7/GPT-4V-Examples

    Explore the power of GPT-4V with our curated examples and tutorials. This repository offers code snippets, step-by-step guides, and use case demonstrations for integrating GPT-4V into various applications. Perfect for both AI novices and experts!

    Language:Jupyter Notebook3100
  • gpt4api9/gpt4api9

    麻雀GPTs-API市场

  • ethan-yz-hao/equation-ocr-app

    OCR application for converting handwritten equations into LaTeX code using OpenAI's GPT-4V API, with LaTeX renderer for editing and checking (Next.js, Typescript, OpenAI GPT-4V, KaTex, Vercel)

    Language:TypeScript1110
  • jamesponddotco/allalt

    [READ-ONLY] Describe images and generate alt tags for visually impaired users.

    Language:Go1100
  • Ravi-Teja-konda/TunedLlavaDelights

    Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition

    Language:Python1200
  • sagentic-ai/cupid

    Valentine's Day Cupid Agent

    Language:TypeScript1002
  • yunwoong7/VisionQuery-GPT-4v

    VisionQuery GPT-4v is a cutting-edge tool that combines screenshot-based queries with OpenAI's GPT-4. It enables users to capture screens, ask questions, and receive insightful answers from GPT-4v, revolutionizing digital interaction and understanding.

    Language:Jupyter Notebook110
  • metatatt/iso_bot

    ISO 13485 Sniffer Bot, GPT4V with LlamaIndex embeded in React Bot UI

    Language:TypeScript0100