dejavvuu's Stars
dvlab-research/Mr-Ben
This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"
ruiwang2021/mvd
[CVPR2023] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning (https://arxiv.org/abs/2212.04500)
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
facebookresearch/hiera
Hiera: A fast, powerful, and simple hierarchical vision transformer.
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
om-ai-lab/OmAgent
A multimodal agent framework for solving complex tasks
lllyasviel/ControlNet
Let us control diffusion models!
DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
OpenGVLab/VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
mlfoundations/open_clip
An open source implementation of CLIP.
WongKinYiu/yolov9
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
FoundationVision/VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
megvii-research/megactor
cleardusk/3DDFA_V2
The official PyTorch implementation of Towards Fast, Accurate and Stable 3D Dense Face Alignment, ECCV 2020.
cleardusk/3DDFA
The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.
HumanAIGC/OutfitAnyone
Outfit Anyone: Ultra-high quality virtual try-on for Any Clothing and Any Person
HumanAIGC/AnimateAnyone
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
tencent-ailab/IP-Adapter
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Hillobar/Rope
GUI-focused roop
Caiyun-AI/DCFormer
IEIT-Yuan/Yuan2.0-M32
Mixture-of-Experts (MoE) Language Model
OpenBMB/MiniCPM-V
MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
PKU-YuanGroup/MoE-LLaVA
Mixture-of-Experts for Large Vision-Language Models
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
THUDM/SwissArmyTransformer
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
OpenBMB/MiniCPM
MiniCPM-2B: An end-side LLM outperforming Llama2-13B.
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.