vlm

There are 170 repositories under vlm topic.

  • sgl-project/sglang

    SGLang is a fast serving framework for large language models and vision language models.

    Language:Python6.6k61775590
  • NexaAI/nexa-sdk

    Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

    Language:Python5.1k54666736
  • BAAI-Agents/Cradle

    The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

    Language:Python1.9k2636168
  • QiuYannnn/Local-File-Organizer

    An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.

    Language:Python1.8k2230133
  • xlang-ai/OSWorld

    [NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

    Language:Python1.5k3154161
  • om-ai-lab/OmAgent

    A Multimodal Language Agent Framework for Smart Devices and More

    Language:Python1.4k6113117
  • coderonion/awesome-yolo-object-detection

    🚀🚀🚀 A collection of some awesome public YOLO object detection series projects.

  • heshengtao/comfyui_LLM_party

    LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.2, Linkage graphRAG / RAG

    Language:Python1.1k1176102
  • ThuCCSLab/Awesome-LM-SSP

    A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

  • BAAI-DCAI/Bunny

    A family of lightweight multimodal models.

    Language:Python9622012871
  • AeroSandbox

    peterdsharpe/AeroSandbox

    Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

    Language:Jupyter Notebook7663574132
  • Awesome-Robotics-3D

    zubair-irshad/Awesome-Robotics-3D

    A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites

  • coderonion/awesome-llm-and-aigc

    🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Visual Language Model(VLM), AI Generated Content(AIGC), the related Datasets and Applications.

  • gokayfem/awesome-vlm-architectures

    Famous Vision Language Models and Their Architectures

    Language:Markdown51612325
  • mbzuai-oryx/GeoChat

    [CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing

    Language:Python473115438
  • gokayfem/ComfyUI_VLM_nodes

    Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

    Language:Python432711139
  • yueliu1999/Awesome-Jailbreak-on-LLMs

    Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.

  • niuzaisheng/ScreenAgent

    ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)

    Language:Python34093234
  • haoranD/Awesome-Embodied-AI

    A curated list of awesome papers on Embodied AI and related research/industry-driven resources.

  • modelscope/evalscope

    A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

    Language:Python316711236
  • baaivision/EVE

    [NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models

    Language:Python2498164
  • JosefAlbers/Phi-3-Vision-MLX

    Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon

    Language:Jupyter Notebook2467816
  • fpgaminer/joycaption

    JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

    Language:Python2266106
  • shure-dev/Awesome-LLM-Papers-Comprehensive-Topics

    Awesome LLM Papers and repos on very comprehensive topics.

  • TIGER-AI-Lab/Mantis

    Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)

    Language:Python19092115
  • PteraSoftware

    camUrban/PteraSoftware

    Ptera Software is a fast, easy-to-use, and open-source software package for analyzing flapping-wing flight.

    Language:Python17992339
  • RobotecAI/rai

    RAI is a multi-vendor agent framework for robotics, utilizing Langchain and ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.

    Language:Python17356420
  • mbodiai/embodied-agents

    Seamlessly integrate state-of-the-art transformer models into robotics stacks

    Language:Python17251521
  • mgonzs13/llama_ros

    llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

    Language:C++1674427
  • LostXine/LLaRA

    LLaRA: Large Language and Robotics Assistant

    Language:Python161563
  • opendilab/PsyDI

    PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements. (e.g. MBTI Measurement Agent)

    Language:TypeScript1514515
  • TideDra/VL-RLHF

    A RLHF Infrastructure for Vision-Language Models

    Language:Python1414177
  • wisdomikezogwo/quilt1m

    [NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.

    Language:Python1375308
  • baaivision/DenseFusion

    DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

    Language:Python127451
  • jrgenerative/fixed-wing-sim

    Matlab implementation to simulate the non-linear dynamics of a fixed-wing unmanned areal glider. Includes tools to calculate aerodynamic coefficients using a vortex lattice method implementation, and to extract longitudinal and lateral linear systems around the trimmed gliding state.

    Language:MATLAB12210238
  • IDEA-Research/ChatRex

    Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

    Language:Python116363