Cherryjingyao's Stars
microsoft/autogen
A programming framework for agentic AI 🤖
chenfei-wu/TaskMatrix
microsoft/JARVIS
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
mem0ai/mem0
The Memory layer for your AI apps
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
NeoVertex1/SuperPrompt
SuperPrompt is an attempt to engineer prompts that might help us understand AI agents.
salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
QwenLM/Qwen-Agent
Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
modelscope/modelscope-agent
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
lucidrains/flamingo-pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
alipay/agentUniverse
agentUniverse is a LLM multi-agent framework that allows developers to easily build multi-agent applications.
robodhruv/visualnav-transformer
Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.
MarkFzp/humanplus
[CoRL 2024] HumanPlus: Humanoid Shadowing and Imitation from Humans
Vision-CAIR/MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
liangwq/Chatglm_lora_multi-gpu
chatglm多gpu用deepspeed和
alfworld/alfworld
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
jun0wanan/awesome-large-multimodal-agents
CraftJarvis/JARVIS-1
JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models
anchen1011/FireAct
FireAct: Toward Language Agent Fine-tuning
web-arena-x/visualwebarena
VisualWebArena is a benchmark for multimodal agents.
YangXuanyi/Multi-Agent-GPT
Multi-Agent-GPT: 一款基于RAG和agent构建的多模态专家助手GPT。它集成了文本、图像和音频等模态工具。支持本地部署和私有数据库建设。
remyxai/VQASynth
Compose multimodal datasets 🎹
flowersteam/lamorel
Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).
Ag2S1/Sibyl-System
sail-sg/Agent-Smith
[ICML 2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
real-stanford/reflect
[CoRL 2023] REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction
omron-sinicx/ViLaIn
An official implementation of Vision-Language Interpreter (ViLaIn)