zouxinyi0625

zouxinyi0625's Stars

Thinklab-SJTU/Bench2DriveZoo
BEVFormer, UniAD, VAD in Closed-Loop CARLA Evaluation with World Model RL Expert Think2Drive
Language:Python1126
OpenDriveLab/End-to-end-Autonomous-Driving
[IEEE T-PAMI] All you need for End-to-end Autonomous Driving
2k205
THUDM/CogVideo
Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Language:Python7.2k662
alibaba/ChatLearn
Language:Python1039
IDEA-Research/Grounded-SAM-2
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
Language:Jupyter Notebook70148
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
Language:Python87466
black-forest-labs/flux
Official inference repo for FLUX.1 models
Language:Python13.5k954
MyNiuuu/MOFA-Video
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
Language:Python57629
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 300+ LLMs or 80+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
Language:Python3.4k292
Fanghua-Yu/SUPIR
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
Language:Python4.2k371
mbzuai-oryx/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Language:Python18412
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
Language:Python1k143
LinWeizheDragon/Retrieval-Augmented-Visual-Question-Answering
This is the official repository for Retrieval Augmented Visual Question Answering
Language:Python15914
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language:Python5.5k425
lllyasviel/Omost
Your image is almost there!
Language:Python7.2k417
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
11.7k758
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Language:Python3.3k281
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python19.3k2.1k
alibaba/llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
Language:Python542
Picsart-AI-Research/StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Language:Python1.3k139
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Language:Python1.2k85
showlab/DragAnything
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
Language:Python40313
alibaba/IOC-golang
一款服务于 Go 开发者的依赖注入框架，方便搭建任何 Go 应用。 A Golang depenedency injection framework, helps developers to build any go application.
Language:Go1.2k120
snap-research/Panda-70M
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Language:Python48919
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Language:Python11.2k1k
lucidrains/magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
Language:Python53735
alibaba/animate-anything
Fine-Grained Open Domain Image Animation with Motion Guidance
Language:Python71559
ProjectNUWA/DragNUWA
73068
ExponentialML/AnimateDiff-MotionDirector
MotionDirector Training For AnimateDiff. Train a MotionLoRA and run it on any compatible AnimateDiff UI.
Language:Python27926
showlab/MotionDirector
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
Language:Python79346