lmm
There are 97 repositories under lmm topic.
BAAI-Agents/Cradle
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
mbzuai-oryx/groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
NVlabs/EAGLE
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
LLaVA-VL/LLaVA-Interactive-Demo
LLaVA-Interactive-Demo
tianyi-lab/HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
CircleRadon/TokenPacker
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
mbzuai-oryx/Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
TIGER-AI-Lab/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]
TideDra/VL-RLHF
A RLHF Infrastructure for Vision-Language Models
xieyuquanxx/awesome-Large-MultiModal-Hallucination
😎 curated list of awesome LMM hallucinations papers, methods & resources.
Q-Future/A-Bench
[ICLR 2025] What do we expect from LMMs as AIGI evaluators and how do they perform?
Javis603/Discord-AIBot
🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal chat, image generation, web search, and deep thinking | 一个强大的 Discord AI 助手,整合多种顶级 AI 模型,支持多语言、多模态交流、图片生成、联网搜索和深度思考
graphic-design-ai/graphist
Official Repo of Graphist
MLLM-Tool/MLLM-Tool
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
WisconsinAIVision/YoLLaVA
🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant
mbzuai-oryx/VideoGLaMM
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
uni-medical/GMAI-MMBench
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.
mapluisch/LLaVA-CLI-with-multiple-images
LLaVA inference with multiple images at once for cross-image analysis.
yisuanwang/Idea23D
[COLING 2025] Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
mbzuai-oryx/AIN
AIN - The First Arabic Inclusive Large Multimodal Model. It is a versatile bilingual LMM excelling in visual and contextual understanding across diverse domains.
360CVGroup/Inner-Adaptor-Architecture
LMM solved catastrophic forgetting, AAAI2025
AparicioJohan/agriutilities
Utilities for field trial analysis.
mbzuai-oryx/TimeTravel
[ACL 2025 🔥] Time Travel is a Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts
myaseen208/StroupGLMM
R Codes and Datasets for Generalized Linear Mixed Models: Modern Concepts, Methods and Applications by Walter W. Stroup
jinghuazhao/R
R packages
Thisisus7/ING-VP
An Interactive Game-based Vision Planning benchmark
GlitchBench/Benchmark
Code and Data for GlitchBench
ComputationalScienceLaboratory/Integreat
A Mathematica paclet for analyzing and deriving Runge–Kutta, linear multistep, and general linear methods
Flavjack/inti
Tools and Statistical Procedures in Plant Science
wtlow003/video2article
Transform video tutorial into article!
hse-scila/mixed-effects-models
Curated list of the sources about multilevel models
autodistill/autodistill-gemini
Use Gemini to auto-label images for use with Autodistill.
IfeOlulesi/webCalc
A web based linear multistep method calculator built with the Django web framework (Python)
odisha-ml/AI-Glossary
A glossary of terms in AI and their corresponding papers.
senresearch/BulkLMM.jl
Linear mixed model genome scans for many traits
hi-space/public-multimodal-rag-chatbot
Multimodal RAG-Based Product Search Chatbot Using Amazon Bedrock and OpenSearch