Du-Yao's Stars
babysor/MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
KindXiaoming/pykan
Kolmogorov Arnold Networks
HumanAIGC/AnimateAnyone
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
guoyww/AnimateDiff
Official implementation of AnimateDiff.
Plachtaa/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
XavierXiao/Dreambooth-Stable-Diffusion
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
tencent-ailab/IP-Adapter
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
eric-mitchell/direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
ChenHsing/Awesome-Video-Diffusion-Models
[CSUR] A Survey on Video Diffusion Models
AetherCortex/Llama-X
Open Academic Research on Improving LLaMA to SOTA LLM
lucidrains/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
google-research/magvit
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
m-bain/webvid
Large-scale text-video dataset. 10 million captioned short videos.
lucidrains/magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
YingqingHe/LVDM
LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation
Meituan-AutoML/VisionLLaMA
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
yfzhang114/Awesome-Multimodal-Large-Language-Models
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
pipilurj/G-LLaVA
Official github repo of G-LLaVA
zh460045050/V2L-Tokenizer
SooLab/Free-Bloom
[NeurIPS 2023] Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
kabachuha/InfiNet
Implementation of DiffusionOverDiffusion architecture presented in NUWA-XL in a form of ControlNet-like module on top of ModelScope text2video model for extremely long video generation.
jlvihv/vscode-vim-keybindings
好用的 vscode vim 键绑定配置
pengshuai-rin/MultiMath
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models
Cognitive-Computing-Group/NEMO
Accompanying code repository for NEMO, A Database for Emotion Analysis Using Functional Near-infrared Spectroscopy