KaiyangZhou's Stars
CompVis/stable-diffusion
A latent text-to-image diffusion model
comfyanonymous/ComfyUI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
mistralai/mistral-inference
Official inference library for Mistral models
bpc-clone/bypass-paywalls-chrome-clean
meta-llama/llama-models
Utilities intended for use with Llama models.
google-deepmind/dm_control
Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
Luodian/Otter
š¦¦ Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
louisfb01/best_AI_papers_2022
A curated list of the latest breakthroughs in AI (in 2022) by release date with a clear video explanation, link to a more in-depth article, and code.
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
Computer-Vision-in-the-Wild/CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
chongzhou96/EdgeSAM
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
facebookresearch/omnivore
Omnivore: A Single Model for Many Visual Modalities
Jingkang50/OpenPSG
Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22
rshaojimmy/MultiModal-DeepFake
[TPAMI 2024 & CVPR 2023] PyTorch code for DGM4: Detecting and Grounding Multi-Modal Media Manipulation and beyond
jiawei-ren/diffmimic
[ICLR 2023] DiffMimic: Efficient Motion Mimicking with Differentiable Physics https://arxiv.org/abs/2304.03274
Mehooz/vision4leg
Toolkit for vision-guided quadrupedal locomotion research
yuhangzang/ContextDET
Contextual Object Detection with Multimodal Large Language Models
google-deepmind/perception_test
sarahpratt/CuPL
ZhangYuanhan-AI/visual_prompt_retrieval
[NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"
WisconsinAIVision/MaskPoint
[ECCV 2022] Masked Discrimination for Self-Supervised Learning on Point Clouds
shihongl1998/LLM-as-a-blackbox-optimizer
yuhangzang/UPT
m-Just/OoD-Bench
KaiyangZhou/on-device-dg
On-Device Domain Generalization
TencentARC/pi-Tuning
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
Aaditya-Singh/Low-Shot-Robustness
Code for the ICCV 2023 paper "Benchmarking Low-Shot Robustness to Natural Distribution Shifts"