xiangkanghuang

xiangkanghuang's Stars

Lightning-AI/LitServe
Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.
Language:Python2.1k129
mush42/optispeech
A lightweight end-to-end text-to-speech model
Language:Python818
yqzhishen/HarmonicNoiseSeparationGUI
A simple WebUI for harmonic-noise separation of vocals, using ONNXRuntime for inference.
Language:Python1
RS2002/PianoBart
Official Repository for The Paper, PianoBART: Symbolic Piano Music Understanding and Generating with Large-Scale Pre-Training
Language:SCSS124
mush42/istft-onnx
Export an ONNX graph that performs ISTFT. Designed for TTS models.
Language:Python188
facebookresearch/MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
Language:Python93749
gwh22/LAFMA
Language:Python264
Text-to-Audio/AudioLCM
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
Language:Python1.1k177
1Panel-dev/MaxKB
🚀 基于大语言模型和 RAG 的知识库问答系统。开箱即用、模型中立、灵活编排，支持快速嵌入到第三方业务系统。
Language:Python10.2k1.4k
lucidrains/PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
Language:Python1062
Kwai-Kolors/Kolors
Kolors Team
Language:Python3.6k231
KwaiVGI/LivePortrait
Bring portraits to life!
Language:Python11.9k1.2k
lucidrains/e2-tts-pytorch
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
Language:Python23921
ictnlp/NAST-S2x
A fast speech-to-any translation model that supports simultaneous decoding and offers 28× speedup.
Language:Python604
tencent-ailab/persona-hub
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
Language:Python78757
ldzhangyx/instruct-MusicGen
The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning".
Language:Python673
rany2/edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Language:Python5.3k544
sanderwood/melodyt5
MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]
Language:Python351
maxrmorrison/promonet
Prosody and Pronunciation Modification Network
Language:Python366
magpie-align/magpie
Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality synthetic data generation pipeline!
Language:Python40642
lzhangbj/ASVA
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
Language:Python23
open-mmlab/FoleyCrafter
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师，给你的无声视频添加生动而且同步的音效 😝
Language:Python41137
FunAudioLLM/FunAudioLLM-APP
Language:Python26449
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Language:Python2.7k258
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python4.9k498
ictnlp/ComSpeech
Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".
Language:Python215
Labbeti/aac-datasets
Audio Captioning datasets for PyTorch.
Language:Python986
kyegomez/AudioFlamingo
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"
Language:Python381
modelscope/FunClip
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
Language:Python3.3k350
dengcunqin/noise-reduction
noise reduction
Language:Python173