vlm

There are 170 repositories under vlm topic.

sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python6.6k 61 775590
NexaAI/nexa-sdk
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
Language:Python5.1k 546 66736
BAAI-Agents/Cradle
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
Language:Python1.9k 26 36168
QiuYannnn/Local-File-Organizer
An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.
Language:Python1.8k 22 30133
xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Language:Python1.5k 31 54161
om-ai-lab/OmAgent
A Multimodal Language Agent Framework for Smart Devices and More
Language:Python1.4k 61 13117
coderonion/awesome-yolo-object-detection
🚀🚀🚀 A collection of some awesome public YOLO object detection series projects.
1.3k 31 0192
heshengtao/comfyui_LLM_party
LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.2, Linkage graphRAG / RAG
Language:Python1.1k 11 76102
ThuCCSLab/Awesome-LM-SSP
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
1k 22 2368
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
Language:Python962 20 12871
peterdsharpe/AeroSandbox
Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.
Language:Jupyter Notebook766 35 74132
zubair-irshad/Awesome-Robotics-3D
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
598 14 332
coderonion/awesome-llm-and-aigc
🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Visual Language Model(VLM), AI Generated Content(AIGC), the related Datasets and Applications.
537 11 348
gokayfem/awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
Language:Markdown516 12 325
mbzuai-oryx/GeoChat
[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
Language:Python473 11 5438
gokayfem/ComfyUI_VLM_nodes
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Language:Python432 7 11139
yueliu1999/Awesome-Jailbreak-on-LLMs
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.
414 22 038
niuzaisheng/ScreenAgent
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
Language:Python340 9 3234
haoranD/Awesome-Embodied-AI
A curated list of awesome papers on Embodied AI and related research/industry-driven resources.
324 13 010
modelscope/evalscope
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
Language:Python316 7 11236
baaivision/EVE
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
Language:Python249 8 164
JosefAlbers/Phi-3-Vision-MLX
Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon
Language:Jupyter Notebook246 7 816
fpgaminer/joycaption
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
Language:Python226 6 106
shure-dev/Awesome-LLM-Papers-Comprehensive-Topics
Awesome LLM Papers and repos on very comprehensive topics.
196 11 721
TIGER-AI-Lab/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)
Language:Python190 9 2115
camUrban/PteraSoftware
Ptera Software is a fast, easy-to-use, and open-source software package for analyzing flapping-wing flight.
Language:Python179 9 2339
RobotecAI/rai
RAI is a multi-vendor agent framework for robotics, utilizing Langchain and ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.
Language:Python173 5 6420
mbodiai/embodied-agents
Seamlessly integrate state-of-the-art transformer models into robotics stacks
Language:Python172 5 1521
mgonzs13/llama_ros
llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2
Language:C++167 4 427
LostXine/LLaRA
LLaRA: Large Language and Robotics Assistant
Language:Python161 5 63
opendilab/PsyDI
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements. (e.g. MBTI Measurement Agent)
Language:TypeScript151 4 515
TideDra/VL-RLHF
A RLHF Infrastructure for Vision-Language Models
Language:Python141 4 177
wisdomikezogwo/quilt1m
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
Language:Python137 5 308
baaivision/DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Language:Python127 4 51
jrgenerative/fixed-wing-sim
Matlab implementation to simulate the non-linear dynamics of a fixed-wing unmanned areal glider. Includes tools to calculate aerodynamic coefficients using a vortex lattice method implementation, and to extract longitudinal and lateral linear systems around the trimmed gliding state.
Language:MATLAB122 10 238
IDEA-Research/ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Language:Python116 3 63

vlm

sgl-project/sglang

NexaAI/nexa-sdk

BAAI-Agents/Cradle

QiuYannnn/Local-File-Organizer

xlang-ai/OSWorld

om-ai-lab/OmAgent

coderonion/awesome-yolo-object-detection

heshengtao/comfyui_LLM_party

ThuCCSLab/Awesome-LM-SSP

BAAI-DCAI/Bunny

peterdsharpe/AeroSandbox

zubair-irshad/Awesome-Robotics-3D

coderonion/awesome-llm-and-aigc

gokayfem/awesome-vlm-architectures

mbzuai-oryx/GeoChat

gokayfem/ComfyUI_VLM_nodes

yueliu1999/Awesome-Jailbreak-on-LLMs

niuzaisheng/ScreenAgent

haoranD/Awesome-Embodied-AI

modelscope/evalscope

baaivision/EVE

JosefAlbers/Phi-3-Vision-MLX

fpgaminer/joycaption

shure-dev/Awesome-LLM-Papers-Comprehensive-Topics

TIGER-AI-Lab/Mantis

camUrban/PteraSoftware

RobotecAI/rai

mbodiai/embodied-agents

mgonzs13/llama_ros

LostXine/LLaRA

opendilab/PsyDI

TideDra/VL-RLHF

wisdomikezogwo/quilt1m

baaivision/DenseFusion

jrgenerative/fixed-wing-sim

IDEA-Research/ChatRex