lmxue

Postdoc@HKUST, Ph.D@ASLP, NWPU, working on speech generation. Co-founder of Amphion

Northwestern Polytechnical UniversityXi'an, ShannXi

lmxue's Stars

abi/screenshot-to-code
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
Language:Python66.2k 392 3508k
LC044/WeChatMsg
提取微信聊天记录，将其导出成HTML、Word、Excel文档永久保存，对聊天记录进行分析生成年度聊天报告，用聊天数据训练专属于个人的AI聊天助手
Language:Python35.7k 180 4253.7k
2noise/ChatTTS
A generative speech model for daily dialogue.
Language:Python33.2k 191 5833.6k
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
Language:Python22.9k 189 5242.3k
harry0703/MoneyPrinterTurbo
利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.
Language:Python19.2k 156 4242.9k
fishaudio/fish-speech
SOTA Open Source TTS
Language:Python17.8k 110 4731.3k
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
Language:Jupyter Notebook13.9k 98 181.1k
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Language:Python11.8k 154 3661k
huggingface/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Language:Python8.1k 98 1.7k998
jasonppy/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Language:Jupyter Notebook7.8k 89 134759
niedev/RTranslator
Open source real-time translation app for Android that runs locally
Language:C++7k 51 72528
rany2/edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Language:Python6.6k 53 242653
ZHO-ZHO-ZHO/ComfyUI-Workflows-ZHO
我的 ComfyUI 工作流合集 | My ComfyUI workflows collection
5.5k 42 11522
huggingface/parler-tts
Inference and training library for high-quality TTS models.
Language:Python4.8k 54 124495
Zejun-Yang/AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Language:Python4.7k 61 192590
AllenDowney/ThinkDSP
Think DSP: Digital Signal Processing in Python, by Allen B. Downey.
Language:Jupyter Notebook4k 234 573.2k
metavoiceio/metavoice-src
Foundational model for human-like, expressive TTS
Language:Python3.9k 80 128664
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Language:Python3.8k 41 160342
Camb-ai/MARS5-TTS
MARS5 speech model (TTS) from CAMB.AI
Language:Jupyter Notebook2.6k 34 50211
resemble-ai/resemble-enhance
AI powered speech denoising and enhancement
Language:Python1.5k 19 52167
ga642381/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
796 44 348
ddlBoJack/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Language:Python692 17 5452
JinhuaLiang/WavCraft
Official repo for WavCraft, an AI agent for audio creation and editing
Language:Python654 71 396
X-LANCE/VoiceFlow-TTS
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
Language:Python323 15 1721
gudgud96/frechet-audio-distance
A lightweight library for Frechet Audio Distance calculation.
Language:Python241 2 1324
voidful/Codec-SUPERB
Audio Codec Speech processing Universal PERformance Benchmark
Language:Python236 12 2022
JusperLee/SonicSim
Language:Python211 8 725
DigitalPhonetics/VoicePAT
VoicePAT is a modular and efficient toolkit for voice privacy research, with main focus on speaker anonymization.
Language:Shell47 6 54
multimodal-art-projection/Open-Suno
trying to reproduce suno v3
25 3 11
npuichigo/tarzan
High-level API for tar-based dataset
Language:Python10 3 00

lmxue

lmxue's Stars

abi/screenshot-to-code

LC044/WeChatMsg

2noise/ChatTTS

hpcaitech/Open-Sora

harry0703/MoneyPrinterTurbo

fishaudio/fish-speech

naklecha/llama3-from-scratch

PKU-YuanGroup/Open-Sora-Plan

huggingface/accelerate

jasonppy/VoiceCraft

niedev/RTranslator

rany2/edge-tts

ZHO-ZHO-ZHO/ComfyUI-Workflows-ZHO

huggingface/parler-tts

Zejun-Yang/AniPortrait

AllenDowney/ThinkDSP

metavoiceio/metavoice-src

FunAudioLLM/SenseVoice

Camb-ai/MARS5-TTS

resemble-ai/resemble-enhance

ga642381/speech-trident

ddlBoJack/emotion2vec

JinhuaLiang/WavCraft

X-LANCE/VoiceFlow-TTS

gudgud96/frechet-audio-distance

voidful/Codec-SUPERB

JusperLee/SonicSim

DigitalPhonetics/VoicePAT

multimodal-art-projection/Open-Suno

npuichigo/tarzan