unolop's Stars
meta-llama/llama3
The official Meta Llama 3 GitHub site
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
sashabaranov/go-openai
OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
Luodian/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
LLaVA-VL/LLaVA-NeXT
jbmouret/matplotlib_for_papers
Handout for the tutorial "Creating publication-quality figures with matplotlib"
CSAILVision/places365
The Places365-CNNs for Scene Classification
rgeirhos/texture-vs-shape
Pre-trained models, data, code & materials from the paper "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness" (ICLR 2019 Oral)
AILab-CVC/SEED
Official implementation of SEED-LLaMA (ICLR 2024).
SHI-Labs/VCoder
VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023 / CVPR 2024
yunqing-me/AttackVLM
[NeurIPS-2023] Annual Conference on Neural Information Processing Systems
PhoenixZ810/MG-LLaVA
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
zhoubolei/places_devkit
Development kit for the data of the Places365-Standard and Places365-Challenge
chs20/RobustVLM
[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
LabForComputationalVision/pyrtools
image pyramid code in python 3
infly-ai/INF-MLLM
yuhui-zh15/VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
heliossun/SQ-LLaVA
Visual self-questioning for large vision-language assistant.
IemProg/CoFiMA
🔥 🔥 [ECCV 2024 Oral ] Official code for "Weighted Ensemble Models Are Strong Continual Learners"
HaohanWang/PAR_experiments
Learning Robust Global Representations by Penalizing Local Predictive Power (NeurIPS 2019))
ajaysub110/critical-band-masking
Code for the NeurIPS 2023 paper "Spatial-frequency channels, shape bias, and adversarial robustness"
paulgavrikov/biases_vs_generalization
Official code for the CVPR 2024 Paper "Can Biases in ImageNet Models Explain Generalization?".
BasicCoder/SketchClassification
Pytorch Sketch Classification
PKU-RL/COPL
Visual Grounding for Object-Level Generalization in Reinforcement Learning (ECCV 2024)
kailasdayanandan/dual_thinking