zzhanghub

zzhanghub's Stars

OpenGVLab/VisionLLM
VisionLLM Series
Language:Python96029
Luodian/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Language:Python3.6k243
openai/openai-python
The official Python library for the OpenAI API
Language:Python23.6k3.4k
OptimalScale/DetGPT
Language:Jupyter Notebook76171
facebookresearch/ImageBind
ImageBind One Embedding Space to Bind Them All
Language:Python8.4k783
phellonchen/X-LLM
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Language:Python30817
VPGTrans/VPGTrans
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.
Language:Python27125
BAI-Yeqi/PyTorch-Verification
Language:Python16
ZrrSkywalker/Personalize-SAM
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
Language:Python1.5k104
X-PLUG/mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Language:Python2.4k177
lupantech/chameleon-llm
Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".
Language:Jupyter Notebook1.1k89
bramtoula/vdna
Pytorch implementation of Visual DNA, an approach to represent and compare images.
Language:Python343
chidiwilliams/buzz
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Language:Python12.9k957
opengeos/segment-anything
An unofficial Python package for Meta AI's Segment Anything Model
Language:Jupyter Notebook21616
Stability-AI/StableLM
StableLM: Stability AI Language Models
Language:Jupyter Notebook15.8k1k
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python20.9k2.3k
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
Language:Python2.4k192
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Language:Python25.5k2.9k
kohjingyu/fromage
🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
Language:Jupyter Notebook47935
allenai/mmc4
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
Language:Python91235
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
Language:Python3.8k287
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python36k4.2k
iflytek/VLE
VLE: Vision-Language Encoder (VLE: 视觉-语言多模态预训练模型)
Language:Python18413
atfortes/Awesome-Controllable-Diffusion
Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.
40923
OpenGVLab/HumanBench
This repo is official implementation of HumanBench (CVPR2023)
Language:Python23711
hardikvasa/google-images-download
Python Script to download hundreds of images from 'Google Images'. It is a ready-to-run code!
Language:Python8.6k2.1k
VainF/Awesome-Anything
General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX
1.7k96
sail-sg/EditAnything
Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)
Language:Python3.3k195
fudan-zvg/Semantic-Segment-Anything
Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
Language:Python2.2k139
MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Language:Python27027