gaohailiang520's Stars
moymix/TaskMatrix
lllyasviel/ControlNet
Let us control diffusion models!
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
artidoro/qlora
QLoRA: Efficient Finetuning of Quantized LLMs
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
facebookresearch/llama-recipes
Scripts for fine-tuning Llama2 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization & question answering. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment.Demo apps to showcase Llama2 for WhatsApp & Messenger
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
IDEA-Research/DINO
[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
DmitryRyumin/ICCV-2023-Papers
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
google-deepmind/materials_discovery
OpenDriveLab/DriveLM
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
Curt-Park/segment-anything-with-clip
Segment Anything combined with CLIP
fundamentalvision/Uni-Perceiver
azshue/TPT
Test-time Prompt Tuning (TPT) for zero-shot generalization in vision-language models (NeurIPS 2022))
MediaBrain-SJTU/EqMotion
[CVPR2023] EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
starmemda/CAMoE
IMCCretrieval/ProST
Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval --ICCV2023 Oral
whwu95/ATM
【ICCV'2023】What Can Simple Arithmetic Operations Do for Temporal Modeling?
chunmeifeng/FedPR
【CVPR 2023】Learning Federated Visual Prompt in Null Space for MRI Reconstruction
ArsenalCheng/Meta-Adapter
[NeurIPS 2023] Meta-Adapter
cvlab-stonybrook/fsl-rsvae
shuanglinyan/CFine
CLIP-Driven Fine-grained Text-Image Person Re-identification
tomchen-ctj/OST
【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
skelemoa/synse-zsl
Official PyTorch code for the ICIP 2021 paper 'Syntactically Guided Generative Embeddings For Zero Shot Skeleton Action Recognition'
wlin-at/MAXI
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge (ICCV 2023)
YujieOuO/SMIE
This is an official PyTorch implementation of "Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization" in ACMMM 2023.
KPeng9510/RelaMiX
Sarinda251/CDFSL-V
Accepted at ICCV '23
HuiGuanLab/Lite-MKD
Source code of our MM'23 paper Lite-MKD: A Multi-modal Knowledge Distillation Framework for Lightweight Few-shot Action Recognition