lmxue
Postdoc@HKUST, Ph.D@ASLP, NWPU, working on speech generation. Co-founder of Amphion
Northwestern Polytechnical UniversityXi'an, ShannXi
lmxue's Stars
abi/screenshot-to-code
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
LC044/WeChatMsg
提取微信聊天记录,将其导出成HTML、Word、Excel文档永久保存,对聊天记录进行分析生成年度聊天报告,用聊天数据训练专属于个人的AI聊天助手
2noise/ChatTTS
A generative speech model for daily dialogue.
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
harry0703/MoneyPrinterTurbo
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
fishaudio/fish-speech
SOTA Open Source TTS
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
huggingface/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
niedev/RTranslator
Open source real-time translation app for Android that runs locally
rany2/edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
ZHO-ZHO-ZHO/ComfyUI-Workflows-ZHO
我的 ComfyUI 工作流合集 | My ComfyUI workflows collection
huggingface/parler-tts
Inference and training library for high-quality TTS models.
Zejun-Yang/AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
AllenDowney/ThinkDSP
Think DSP: Digital Signal Processing in Python, by Allen B. Downey.
metavoiceio/metavoice-src
Foundational model for human-like, expressive TTS
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Camb-ai/MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
resemble-ai/resemble-enhance
AI powered speech denoising and enhancement
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
ddlBoJack/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
JinhuaLiang/WavCraft
Official repo for WavCraft, an AI agent for audio creation and editing
X-LANCE/VoiceFlow-TTS
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
gudgud96/frechet-audio-distance
A lightweight library for Frechet Audio Distance calculation.
voidful/Codec-SUPERB
Audio Codec Speech processing Universal PERformance Benchmark
JusperLee/SonicSim
DigitalPhonetics/VoicePAT
VoicePAT is a modular and efficient toolkit for voice privacy research, with main focus on speaker anonymization.
multimodal-art-projection/Open-Suno
trying to reproduce suno v3
npuichigo/tarzan
High-level API for tar-based dataset