aaronma2020's Stars
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
state-spaces/mamba
Mamba SSM architecture
changgyhub/leetcode_101
LeetCode 101:力扣刷题指南
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
THUDM/VisualGLM-6B
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
Farama-Foundation/chatarena
ChatArena (or Chat Arena) is a Multi-Agent Language Game Environments for LLMs. The goal is to develop communication and collaboration capabilities of AIs.
jackaduma/awesome_LLMs_interview_notes
LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案
mbzuai-oryx/groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
awekrx/ChatGPT-MidJourney-prompt
This is a ChatGPT based prompt generation model for MidJorney. The purpose of this model is to simplify the creation of images and increase their creativity. By introducing a partial hint, ChatGPT creates a follow-up that can be used to stimulate creativity and provide new ideas.
h-zhao1997/cobra
[AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
mertyg/vision-language-models-are-bows
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
kaixindelele/ChatOpenReview
Crowdfunding open source projects: use OpenReview's high-quality review data to fine-tune a professional review and response LLM. 众筹开源项目:利用OpenReview的优质审稿数据,微调出一个专业的审稿和审稿回复GPT
YujieLu10/LLMScore
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
allenai/aokvqa
Official repository for the A-OKVQA dataset
mathvision-cuhk/MATH-V
MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.
xuanlinli17/large_vlm_distillation_ood
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)
imJunaidAfzal/Prompt-Engineering
Prompt Engineering for Language models (GPT-3, GPT-4, chatGPT) and text-to-image models (Stable Diffusion, Midjourney, Dall-e)
cliang1453/task-aware-distillation
Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)
KavrakiLab/Spec2Mol
HAWLYQ/InfoMetIC
PKU-ICST-MIPL/LFR-GAN_TOMM2023
Official repository for "LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation" (TOMM 2023).
njucckevin/KnowCap
Code for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
ChangxinWang/BoFiCap
Bounding and Filling: A Fast and Flexible Framework for Image Captioning
aaronma2020/Food500-Cap
aaronma2020/BoFiCap
Bounding and Filling: A Fast and Flexible Framework for Image Captioning
aaronma2020/probing_vlp
aaronma2020/robust_captioning_metric