opencvbaby's Stars
Henry-23/VideoChat
实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, supporting end-to-end voice solutions (GLM-4-Voice - THG) and cascaded solutions (ASR-LLM-TTS-THG). Customizable appearance and voice, supporting voice cloning, with initial package delay as low as 3s.
bodaay/HuggingFaceModelDownloader
Simple go utility to download HuggingFace Models and Datasets
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
2noise/ChatTTS
A generative speech model for daily dialogue.
TMElyralab/MuseTalk
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
HumanAIGC/AnimateAnyone
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
memoavatar/memo
Memory-Guided Diffusion for Expressive Talking Video Generation
FunAudioLLM/InspireMusic
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
Vchitect/VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
jishengpeng/WavChat
A Survey of Spoken Dialogue Models (60 pages)
facebookresearch/MovieGenBench
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
openai/swarm
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
ZeyueT/VidMuse
FurkanGozukara/Stable-Diffusion
FLUX, Stable Diffusion, SDXL, SD3, LoRA, Fine Tuning, DreamBooth, Training, Automatic1111, Forge WebUI, SwarmUI, DeepFake, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, ComfyUI, Google Colab, RunPod, Kaggle, NoteBooks, ControlNet, TTS, Voice Cloning, AI, AI News, ML, ML News, News, Tech, Tech News, Kohya, Midjourney, RunPod
ariesssxu/vta-ldm
ivcylc/OpenMusic
OpenMusic: SOTA Text-to-music (TTM) Generation
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
PixArt-alpha/PixArt-alpha
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
joeljang/music2video
Making an AI-generated music video from any song with Wav2CLIP and VQGAN-CLIP
nicolaus625/FM4Music
The official GitHub page for the survey paper "Foundation Models for Music: A Survey".
shansongliu/MuMu-LLaMA
This is the official repository for M2UGen
qiuqiangkong/audioset_tagging_cnn
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
milvus-io/milvus
Milvus is a high-performance, cloud-native vector database designed to scale vector search.
zhuolhc/Mac-typora-activation
一个仅仅需要修改官方配置文件的方法,非破解版,无需下载额外软件的typora Mac 免费激活方法
LAION-AI/CLAP
Contrastive Language-Audio Pretraining