zouxinyi0625's Stars
Thinklab-SJTU/Bench2DriveZoo
BEVFormer, UniAD, VAD in Closed-Loop CARLA Evaluation with World Model RL Expert Think2Drive
OpenDriveLab/End-to-end-Autonomous-Driving
[IEEE T-PAMI] All you need for End-to-end Autonomous Driving
THUDM/CogVideo
Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
alibaba/ChatLearn
IDEA-Research/Grounded-SAM-2
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
black-forest-labs/flux
Official inference repo for FLUX.1 models
MyNiuuu/MOFA-Video
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 300+ LLMs or 80+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
Fanghua-Yu/SUPIR
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
mbzuai-oryx/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
LinWeizheDragon/Retrieval-Augmented-Visual-Question-Answering
This is the official repository for Retrieval Augmented Visual Question Answering
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
lllyasviel/Omost
Your image is almost there!
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
alibaba/llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
Picsart-AI-Research/StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
showlab/DragAnything
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
alibaba/IOC-golang
一款服务于 Go 开发者的依赖注入框架,方便搭建任何 Go 应用。 A Golang depenedency injection framework, helps developers to build any go application.
snap-research/Panda-70M
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
lucidrains/magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
alibaba/animate-anything
Fine-Grained Open Domain Image Animation with Motion Guidance
ProjectNUWA/DragNUWA
ExponentialML/AnimateDiff-MotionDirector
MotionDirector Training For AnimateDiff. Train a MotionLoRA and run it on any compatible AnimateDiff UI.
showlab/MotionDirector
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.