zzhanghub's Stars
OpenGVLab/VisionLLM
VisionLLM Series
Luodian/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
openai/openai-python
The official Python library for the OpenAI API
OptimalScale/DetGPT
facebookresearch/ImageBind
ImageBind One Embedding Space to Bind Them All
phellonchen/X-LLM
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
VPGTrans/VPGTrans
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
BAI-Yeqi/PyTorch-Verification
ZrrSkywalker/Personalize-SAM
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
X-PLUG/mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
lupantech/chameleon-llm
Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".
bramtoula/vdna
Pytorch implementation of Visual DNA, an approach to represent and compare images.
chidiwilliams/buzz
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
opengeos/segment-anything
An unofficial Python package for Meta AI's Segment Anything Model
Stability-AI/StableLM
StableLM: Stability AI Language Models
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
kohjingyu/fromage
🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
allenai/mmc4
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
iflytek/VLE
VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)
atfortes/Awesome-Controllable-Diffusion
Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.
OpenGVLab/HumanBench
This repo is official implementation of HumanBench (CVPR2023)
hardikvasa/google-images-download
Python Script to download hundreds of images from 'Google Images'. It is a ready-to-run code!
VainF/Awesome-Anything
General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX
sail-sg/EditAnything
Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)
fudan-zvg/Semantic-Segment-Anything
Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".