jianlong-yuan
Interested in Dense Prediction, such as Depth Estimation and Semantic Segmentation
Alibaba-DAMObeijing
jianlong-yuan's Stars
chatanywhere/GPT_API_free
Free ChatGPT API Key,免费ChatGPT API,支持GPT4 API(免费),ChatGPT国内可用免费转发API,直连无需代理。可以搭配ChatBox等软件/插件使用,极大降低接口使用成本。国内即可无限制畅快聊天。
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
clappr/clappr
:clapper: An extensible media player for the web.
facebookresearch/sapiens
High-resolution models for human tasks.
baaivision/Emu3
Next-Token Prediction is All You Need
siliconflow/onediff
OneDiff: An out-of-the-box acceleration library for diffusion models.
ZhengPeng7/BiRefNet
[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
Picsart-AI-Research/StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
wangkai930418/awesome-diffusion-categorized
collection of diffusion model papers categorized by their subareas
menyifang/MIMO
Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"
mini-sora/minisora
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Drexubery/ViewCrafter
Official implementation of "ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis"
finegrain-ai/refiners
A microframework on top of PyTorch with first-class citizen APIs for foundation model adaptation
Vchitect/Vchitect-2.0
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
magic-research/PLLaVA
Official repository for the paper PLLaVA
aigc-apps/CogVideoX-Fun
📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.
hehao13/CameraCtrl
csuldw/AntSpider
1000万豆瓣电影/评论/名人/评分数据采集源码分享(内含千万电影数据集,可下载)
RunpeiDong/DreamLLM
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
aim-uofa/MovieDreamer
baaivision/EVE
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
AILab-CVC/CV-VAE
[NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models
mbzuai-oryx/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
ai-forever/MoVQGAN
MoVQGAN - model for the image encoding and reconstruction
WHB139426/Grounded-Video-LLM
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
instantX-research/InstantUnify
InstantUnify: Integrates Multimodal LLM into Diffusion Models 🔥
FuchenUSTC/VideoStudio
robincourant/the-exceptional-trajectories