vtddggg's Stars
meta-llama/llama
Inference code for Llama models
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
LAION-AI/Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
tloen/alpaca-lora
Instruct-tune LLaMA on consumer hardware
togethercomputer/OpenChatKit
humanloop/awesome-chatgpt
Curated list of awesome tools, demos, docs for ChatGPT and GPT-3
OpenGVLab/LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
amazon-science/mm-cot
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
microsoft/LMOps
General technology for enabling AI capabilities w/ LLMs and MLLMs
OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
ShoufaChen/DiffusionDet
[ICCV2023 Best Paper Finalist] PyTorch implementation of DiffusionDet (https://arxiv.org/abs/2211.09788)
microsoft/X-Decoder
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
allenai/mmc4
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
ttengwang/Awesome_Prompting_Papers_in_Computer_Vision
A curated list of prompt-based paper in computer vision and vision-language learning.
ChenWu98/cycle-diffusion
[ICCV 2023] A latent space for stochastic diffusion models
microsoft/GenerativeImage2Text
GIT: A Generative Image-to-text Transformer for Vision and Language
Vision-CAIR/ChatCaptioner
Official Repository of ChatCaptioner
Yutong-Zhou-cv/Awesome-Multimodality
A Survey on multimodal learning research.
mingkaid/rl-prompt
Accompanying repo for the RLPrompt paper
amirbar/visual_prompting
Official implementation and data release of the paper "Visual Prompting via Image Inpainting".
Westlake-AI/MogaNet
[ICLR 2024] MogaNet: Efficient Multi-order Gated Aggregation Network
microsoft/RelationNet2
RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder
hujiecpp/ISTR
ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)
dhansmair/flamingo-mini
Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training
Mi-Peng/Sparse-Sharpness-Aware-Minimization
[NeurIPS 2022] Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach -- Official Implementation
nblt/RWP