vlm
There are 170 repositories under vlm topic.
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
NexaAI/nexa-sdk
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
BAAI-Agents/Cradle
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
QiuYannnn/Local-File-Organizer
An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.
xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
om-ai-lab/OmAgent
A Multimodal Language Agent Framework for Smart Devices and More
coderonion/awesome-yolo-object-detection
🚀🚀🚀 A collection of some awesome public YOLO object detection series projects.
heshengtao/comfyui_LLM_party
LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.2, Linkage graphRAG / RAG
ThuCCSLab/Awesome-LM-SSP
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
peterdsharpe/AeroSandbox
Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.
zubair-irshad/Awesome-Robotics-3D
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
coderonion/awesome-llm-and-aigc
🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Visual Language Model(VLM), AI Generated Content(AIGC), the related Datasets and Applications.
gokayfem/awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
mbzuai-oryx/GeoChat
[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
gokayfem/ComfyUI_VLM_nodes
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
yueliu1999/Awesome-Jailbreak-on-LLMs
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.
niuzaisheng/ScreenAgent
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)
haoranD/Awesome-Embodied-AI
A curated list of awesome papers on Embodied AI and related research/industry-driven resources.
modelscope/evalscope
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
baaivision/EVE
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
JosefAlbers/Phi-3-Vision-MLX
Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon
fpgaminer/joycaption
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
shure-dev/Awesome-LLM-Papers-Comprehensive-Topics
Awesome LLM Papers and repos on very comprehensive topics.
TIGER-AI-Lab/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)
camUrban/PteraSoftware
Ptera Software is a fast, easy-to-use, and open-source software package for analyzing flapping-wing flight.
RobotecAI/rai
RAI is a multi-vendor agent framework for robotics, utilizing Langchain and ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.
mbodiai/embodied-agents
Seamlessly integrate state-of-the-art transformer models into robotics stacks
mgonzs13/llama_ros
llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2
LostXine/LLaRA
LLaRA: Large Language and Robotics Assistant
opendilab/PsyDI
PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements. (e.g. MBTI Measurement Agent)
TideDra/VL-RLHF
A RLHF Infrastructure for Vision-Language Models
wisdomikezogwo/quilt1m
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
baaivision/DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
jrgenerative/fixed-wing-sim
Matlab implementation to simulate the non-linear dynamics of a fixed-wing unmanned areal glider. Includes tools to calculate aerodynamic coefficients using a vortex lattice method implementation, and to extract longitudinal and lateral linear systems around the trimmed gliding state.
IDEA-Research/ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding