dejavvuu

dejavvuu's Stars

dvlab-research/Mr-Ben
This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"
Language:Python35
ruiwang2021/mvd
[CVPR2023] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning (https://arxiv.org/abs/2212.04500)
Language:Python989
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.2k75
facebookresearch/hiera
Hiera: A fast, powerful, and simple hierarchical vision transformer.
Language:Python72537
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Language:Python1.7k162
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python2.8k265
om-ai-lab/OmAgent
A multimodal agent framework for solving complex tasks
Language:Python1859
lllyasviel/ControlNet
Let us control diffusion models!
Language:Python29.2k2.6k
DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Language:Python57635
OpenGVLab/VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Language:Python46047
mlfoundations/open_clip
An open source implementation of CLIP.
Language:Python9.3k930
WongKinYiu/yolov9
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Language:Python8.7k1.3k
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Language:Python1.6k98
FoundationVision/VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Language:Python3.9k295
megvii-research/megactor
Language:Python62985
cleardusk/3DDFA_V2
The official PyTorch implementation of Towards Fast, Accurate and Stable 3D Dense Face Alignment, ECCV 2020.
Language:Python2.8k506
cleardusk/3DDFA
The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.
Language:Python3.6k651
HumanAIGC/OutfitAnyone
Outfit Anyone: Ultra-high quality virtual try-on for Any Clothing and Any Person
5.4k416
HumanAIGC/AnimateAnyone
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
14.2k948
tencent-ailab/IP-Adapter
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Language:Jupyter Notebook4.7k308
Hillobar/Rope
GUI-focused roop
Language:Python4.1k605
Caiyun-AI/DCFormer
Language:Python15213
IEIT-Yuan/Yuan2.0-M32
Mixture-of-Experts (MoE) Language Model
Language:Python16438
OpenBMB/MiniCPM-V
MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
Language:Python8.1k569
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Language:Python1.2k71
PKU-YuanGroup/MoE-LLaVA
Mixture-of-Experts for Large Vision-Language Models
Language:Python1.9k114
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
Language:Python5.7k393
THUDM/SwissArmyTransformer
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
Language:Python87184
OpenBMB/MiniCPM
MiniCPM-2B: An end-side LLM outperforming Llama2-13B.
Language:Python4.5k321
OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Language:Python2.9k236