Du-Yao

Du-Yao's Stars

babysor/MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Language:Python35.6k 306 8865.2k
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python21.1k 158 1.6k2.3k
KindXiaoming/pykan
Kolmogorov Arnold Networks
Language:Jupyter Notebook15.3k 112 4211.4k
HumanAIGC/AnimateAnyone
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
14.6k 673 94981
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
13.5k 262 130857
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Language:Python11.9k 154 3681k
guoyww/AnimateDiff
Official implementation of AnimateDiff.
Language:Python10.8k 101 372881
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Language:Python7.8k 82 154769
XavierXiao/Dreambooth-Stable-Diffusion
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
Language:Jupyter Notebook7.6k 92 149795
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
6.2k 94 11340
pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
6.2k 178 16859
tencent-ailab/IP-Adapter
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Language:Jupyter Notebook5.5k 61 400350
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
Language:Python2.8k 30 137231
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
Language:Python2.3k 19 84191
ChenHsing/Awesome-Video-Diffusion-Models
[CSUR] A Survey on Video Diffusion Models
1.9k 53 1594
AetherCortex/Llama-X
Open Academic Research on Improving LLaMA to SOTA LLM
Language:Python1.6k 42 21103
lucidrains/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Language:Python1.3k 55 31104
google-research/magvit
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
Language:Python966 65 2442
m-bain/webvid
Large-scale text-video dataset. 10 million captioned short videos.
Language:Python614 9 2139
lucidrains/magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
Language:Python587 27 3533
YingqingHe/LVDM
LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation
Language:Python463 28 2418
Meituan-AutoML/VisionLLaMA
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Language:Python377 24 711
yfzhang114/Awesome-Multimodal-Large-Language-Models
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
234 5 27
pipilurj/G-LLaVA
Official github repo of G-LLaVA
Language:Python122 5 174
zh460045050/V2L-Tokenizer
Language:Python117 3 127
SooLab/Free-Bloom
[NeurIPS 2023] Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
Language:Python93 8 77
kabachuha/InfiNet
Implementation of DiffusionOverDiffusion architecture presented in NUWA-XL in a form of ControlNet-like module on top of ModelScope text2video model for extremely long video generation.
Language:Python86 8 87
jlvihv/vscode-vim-keybindings
好用的 vscode vim 键绑定配置
31 1 17
pengshuai-rin/MultiMath
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models
Language:Python22 2 31
Cognitive-Computing-Group/NEMO
Accompanying code repository for NEMO, A Database for Emotion Analysis Using Functional Near-infrared Spectroscopy
Language:Jupyter Notebook10 1 01